Skip to content
arxiv papers 1 min read

FuSaR: A Fuzzification-Based Method for LRM Safety-Reasoning Balance

Link: http://arxiv.org/abs/2508.12897v1

PDF Link: http://arxiv.org/pdf/2508.12897v1

Summary: Large Reasoning Models (LRMs) have demonstrated impressive performance acrossvarious tasks due to their powerful reasoning capabilities.

However, theirsafety performance remains a significant concern.

In this paper, we explore thereasons behind the vulnerability of LRMs.

Based on this, we propose a novelmethod to improve the safety of LLMs without sacrificing their reasoningcapability.

Specifically, we exploit the competition between LRM's reasoningability and safety ability, and achieve jailbreak by improving LRM's reasoningperformance to reduce its safety performance.

We then introduce an alignmentstrategy based on Fuzzification to balance Safety-Reasoning (FuSaR), bydetoxifying the harmful reasoning process, where both the dangerous entitiesand the dangerous procedures in the reasoning steps are hidden.

FuSaRsuccessfully mitigates safety risks while preserving core reasoninginformation.

We validate this strategy through alignment experiments on severalopen-source LRMs using detoxified reasoning data.

The results compared withexisting baselines conclusively show that FuSaR is an efficient alignmentstrategy to simultaneously enhance both the reasoning capability and safety ofLRMs.

Published on arXiv on: 2025-08-18T12:54:16Z