Link: http://arxiv.org/abs/2505.07584v1
PDF Link: http://arxiv.org/pdf/2505.07584v1
Summary: The increasing deployment of large language models in security-sensitivedomains necessitates rigorous evaluation of their resilience againstadversarial prompt-based attacks.
While previous benchmarks have focused onsecurity evaluations with limited and predefined attack domains, such ascybersecurity attacks, they often lack a comprehensive assessment ofintent-driven adversarial prompts and the consideration of real-lifescenario-based multi-turn attacks.
To address this gap, we presentSecReEvalBench, the Security Resilience Evaluation Benchmark, which definesfour novel metrics: Prompt Attack Resilience Score, Prompt Attack Refusal LogicScore, Chain-Based Attack Resilience Score and Chain-Based Attack RejectionTime Score.
Moreover, SecReEvalBench employs six questioning sequences formodel assessment: one-off attack, successive attack, successive reverse attack,alternative attack, sequential ascending attack with escalating threat levelsand sequential descending attack with diminishing threat levels.
In addition,we introduce a dataset customized for the benchmark, which incorporates bothneutral and malicious prompts, categorised across seven security domains andsixteen attack techniques.
In applying this benchmark, we systematicallyevaluate five state-of-the-art open-weighted large language models, Llama 3.
1,Gemma 2, Mistral v0.
3, DeepSeek-R1 and Qwen 3.
Our findings offer criticalinsights into the strengths and weaknesses of modern large language models indefending against evolving adversarial threats.
The SecReEvalBench dataset ispublicly available athttps://kaggle.
com/datasets/5a7ee22cf9dab6c93b55a73f630f6c9b42e936351b0ae98fbae6ddaca7fe248d,which provides a groundwork for advancing research in large language modelsecurity.
Published on arXiv on: 2025-05-12T14:09:24Z