Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models

Link: http://arxiv.org/abs/2506.13726v1

PDF Link: http://arxiv.org/pdf/2506.13726v1

Summary: The introduction of advanced reasoning capabilities have improved theproblem-solving performance of large language models, particularly on math andcoding benchmarks.

However, it remains unclear whether these reasoning modelsare more or less vulnerable to adversarial prompt attacks than theirnon-reasoning counterparts.

In this work, we present a systematic evaluation ofweaknesses in advanced reasoning models compared to similar non-reasoningmodels across a diverse set of prompt-based attack categories.

Usingexperimental data, we find that on average the reasoning-augmented models are\emph{slightly more robust} than non-reasoning models (42.

51\% vs 45.

53\%attack success rate, lower is better).

However, this overall trend maskssignificant category-specific differences: for certain attack types thereasoning models are substantially \emph{more vulnerable} (e.

g.

, up to 32percentage points worse on a tree-of-attacks prompt), while for others they aremarkedly \emph{more robust} (e.

g.

, 29.

8 points better on cross-site scriptinginjection).

Our findings highlight the nuanced security implications ofadvanced reasoning in language models and emphasize the importance ofstress-testing safety across diverse adversarial techniques.

Published on arXiv on: 2025-06-16T17:32:18Z