Link: http://arxiv.org/abs/2505.19773v1
PDF Link: http://arxiv.org/pdf/2505.19773v1
Summary: We investigate long-context vulnerabilities in Large Language Models (LLMs)through Many-Shot Jailbreaking (MSJ).
Our experiments utilize context length ofup to 128K tokens.
Through comprehensive analysis with various many-shot attacksettings with different instruction styles, shot density, topic, and format, wereveal that context length is the primary factor determining attackeffectiveness.
Critically, we find that successful attacks do not requirecarefully crafted harmful content.
Even repetitive shots or random dummy textcan circumvent model safety measures, suggesting fundamental limitations inlong-context processing capabilities of LLMs.
The safety behavior ofwell-aligned models becomes increasingly inconsistent with longer contexts.
These findings highlight significant safety gaps in context expansioncapabilities of LLMs, emphasizing the need for new safety mechanisms.
Published on arXiv on: 2025-05-26T09:57:25Z