Link: http://arxiv.org/abs/2511.03247v1
PDF Link: http://arxiv.org/pdf/2511.03247v1
Summary: Open-weight models provide researchers and developers with accessiblefoundations for diverse downstream applications.
We tested the safety andsecurity postures of eight open-weight large language models (LLMs) to identifyvulnerabilities that may impact subsequent fine-tuning and deployment.
Usingautomated adversarial testing, we measured each model's resilience againstsingle-turn and multi-turn prompt injection and jailbreak attacks.
Our findingsreveal pervasive vulnerabilities across all tested models, with multi-turnattacks achieving success rates between 25.
86\% and 92.
78\% -- representing a$2\times$ to $10\times$ increase over single-turn baselines.
These resultsunderscore a systemic inability of current open-weight models to maintainsafety guardrails across extended interactions.
We assess that alignmentstrategies and lab priorities significantly influence resilience:capability-focused models such as Llama 3.
3 and Qwen 3 demonstrate highermulti-turn susceptibility, whereas safety-oriented designs such as Google Gemma3 exhibit more balanced performance.
The analysis concludes that open-weight models, while crucial for innovation,pose tangible operational and ethical risks when deployed without layeredsecurity controls.
These findings are intended to inform practitioners anddevelopers of the potential risks and the value of professional AI securitysolutions to mitigate exposure.
Addressing multi-turn vulnerabilities isessential to ensure the safe, reliable, and responsible deployment ofopen-weight LLMs in enterprise and public domains.
We recommend adopting asecurity-first design philosophy and layered protections to ensure resilientdeployments of open-weight models.
Published on arXiv on: 2025-11-05T07:22:24Z