Skip to content
arxiv papers 1 min read

MR. Guard: Multilingual Reasoning Guardrail using Curriculum Learning

Link: http://arxiv.org/abs/2504.15241v1

PDF Link: http://arxiv.org/pdf/2504.15241v1

Summary: Large Language Models (LLMs) are susceptible to adversarial attacks such asjailbreaking, which can elicit harmful or unsafe behaviors.

This vulnerabilityis exacerbated in multilingual setting, where multilingual safety-aligned dataare often limited.

Thus, developing a guardrail capable of detecting andfiltering unsafe content across diverse languages is critical for deployingLLMs in real-world applications.

In this work, we propose an approach to builda multilingual guardrail with reasoning.

Our method consists of: (1) syntheticmultilingual data generation incorporating culturally and linguisticallynuanced variants, (2) supervised fine-tuning, and (3) a curriculum-guided GroupRelative Policy Optimization (GRPO) framework that further improvesperformance.

Experimental results demonstrate that our multilingual guardrailconsistently outperforms recent baselines across both in-domain andout-of-domain languages.

The multilingual reasoning capability of our guardrailenables it to generate multilingual explanations, which are particularly usefulfor understanding language-specific risks and ambiguities in multilingualcontent moderation.

Published on arXiv on: 2025-04-21T17:15:06Z