RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability

Link: http://arxiv.org/abs/2504.10081v1

PDF Link: http://arxiv.org/pdf/2504.10081v1

Summary: Large Reasoning Models (LRMs), such as OpenAI o1 and DeepSeek-R1, have beenrapidly progressing and achieving breakthrough performance on complex reasoningtasks such as mathematics and coding.

However, the open-source R1 models haveraised safety concerns in wide applications, such as the tendency to complywith malicious queries, which greatly impacts the utility of these powerfulmodels in their applications.

In this paper, we introduce RealSafe-R1 assafety-aligned versions of DeepSeek-R1 distilled models.

To train these models,we construct a dataset of 15k safety-aware reasoning trajectories generated byDeepSeek-R1, under explicit instructions for expected refusal behavior.

Bothquantitative experiments and qualitative case studies demonstrate the models'improvements, which are shown in their safety guardrails against both harmfulqueries and jailbreak attacks.

Importantly, unlike prior safety alignmentefforts that often compromise reasoning performance, our method preserves themodels' reasoning capabilities by maintaining the training data within theoriginal distribution of generation.

Model weights of RealSafe-R1 areopen-source at https://huggingface.

co/RealSafe.

Published on arXiv on: 2025-04-14T10:26:37Z