What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs

Link: http://arxiv.org/abs/2505.19773v1

PDF Link: http://arxiv.org/pdf/2505.19773v1

Summary: We investigate long-context vulnerabilities in Large Language Models (LLMs)through Many-Shot Jailbreaking (MSJ).

Our experiments utilize context length ofup to 128K tokens.

Through comprehensive analysis with various many-shot attacksettings with different instruction styles, shot density, topic, and format, wereveal that context length is the primary factor determining attackeffectiveness.

Critically, we find that successful attacks do not requirecarefully crafted harmful content.

Even repetitive shots or random dummy textcan circumvent model safety measures, suggesting fundamental limitations inlong-context processing capabilities of LLMs.

The safety behavior ofwell-aligned models becomes increasingly inconsistent with longer contexts.

These findings highlight significant safety gaps in context expansioncapabilities of LLMs, emphasizing the need for new safety mechanisms.

Published on arXiv on: 2025-05-26T09:57:25Z