Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models

Link: http://arxiv.org/abs/2507.02799v1

PDF Link: http://arxiv.org/pdf/2507.02799v1

Summary: Reasoning Language Models (RLMs) have gained traction for their ability toperform complex, multi-step reasoning tasks through mechanisms such asChain-of-Thought (CoT) prompting or fine-tuned reasoning traces.

While thesecapabilities promise improved reliability, their impact on robustness to socialbiases remains unclear.

In this work, we leverage the CLEAR-Bias benchmark,originally designed for Large Language Models (LLMs), to investigate theadversarial robustness of RLMs to bias elicitation.

We systematically evaluatestate-of-the-art RLMs across diverse sociocultural dimensions, using anLLM-as-a-judge approach for automated safety scoring and leveraging jailbreaktechniques to assess the strength of built-in safety mechanisms.

Our evaluationaddresses three key questions: (i) how the introduction of reasoningcapabilities affects model fairness and robustness; (ii) whether modelsfine-tuned for reasoning exhibit greater safety than those relying on CoTprompting at inference time; and (iii) how the success rate of jailbreakattacks targeting bias elicitation varies with the reasoning mechanismsemployed.

Our findings reveal a nuanced relationship between reasoningcapabilities and bias safety.

Surprisingly, models with explicit reasoning,whether via CoT prompting or fine-tuned reasoning traces, are generally morevulnerable to bias elicitation than base models without such mechanisms,suggesting reasoning may unintentionally open new pathways for stereotypereinforcement.

Reasoning-enabled models appear somewhat safer than thoserelying on CoT prompting, which are particularly prone to contextual reframingattacks through storytelling prompts, fictional personas, or reward-shapedinstructions.

These results challenge the assumption that reasoning inherentlyimproves robustness and underscore the need for more bias-aware approaches toreasoning design.

Published on arXiv on: 2025-07-03T17:01:53Z