SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models

Link: http://arxiv.org/abs/2505.07584v1

PDF Link: http://arxiv.org/pdf/2505.07584v1

Summary: The increasing deployment of large language models in security-sensitivedomains necessitates rigorous evaluation of their resilience againstadversarial prompt-based attacks.

While previous benchmarks have focused onsecurity evaluations with limited and predefined attack domains, such ascybersecurity attacks, they often lack a comprehensive assessment ofintent-driven adversarial prompts and the consideration of real-lifescenario-based multi-turn attacks.

To address this gap, we presentSecReEvalBench, the Security Resilience Evaluation Benchmark, which definesfour novel metrics: Prompt Attack Resilience Score, Prompt Attack Refusal LogicScore, Chain-Based Attack Resilience Score and Chain-Based Attack RejectionTime Score.

Moreover, SecReEvalBench employs six questioning sequences formodel assessment: one-off attack, successive attack, successive reverse attack,alternative attack, sequential ascending attack with escalating threat levelsand sequential descending attack with diminishing threat levels.

In addition,we introduce a dataset customized for the benchmark, which incorporates bothneutral and malicious prompts, categorised across seven security domains andsixteen attack techniques.

In applying this benchmark, we systematicallyevaluate five state-of-the-art open-weighted large language models, Llama 3.

1,Gemma 2, Mistral v0.

3, DeepSeek-R1 and Qwen 3.

Our findings offer criticalinsights into the strengths and weaknesses of modern large language models indefending against evolving adversarial threats.

The SecReEvalBench dataset ispublicly available athttps://kaggle.

com/datasets/5a7ee22cf9dab6c93b55a73f630f6c9b42e936351b0ae98fbae6ddaca7fe248d,which provides a groundwork for advancing research in large language modelsecurity.

Published on arXiv on: 2025-05-12T14:09:24Z