Skip to content
arxiv papers 1 min read

Inception: Jailbreak the Memory Mechanism of Text-to-Image Generation Systems

Link: http://arxiv.org/abs/2504.20376v1

PDF Link: http://arxiv.org/pdf/2504.20376v1

Summary: Currently, the memory mechanism has been widely and successfully exploited inonline text-to-image (T2I) generation systems ($e.

g.

$, DALL$\cdot$E 3) foralleviating the growing tokenization burden and capturing key information inmulti-turn interactions.

Despite its practicality, its security analyses havefallen far behind.

In this paper, we reveal that this mechanism exacerbates therisk of jailbreak attacks.

Different from previous attacks that fuse the unsafetarget prompt into one ultimate adversarial prompt, which can be easilydetected or may generate non-unsafe images due to under- or over-optimization,we propose Inception, the first multi-turn jailbreak attack against the memorymechanism in real-world text-to-image generation systems.

Inception embeds themalice at the inception of the chat session turn by turn, leveraging themechanism that T2I generation systems retrieve key information in their memory.

Specifically, Inception mainly consists of two modules.

It first segments theunsafe prompt into chunks, which are subsequently fed to the system in multipleturns, serving as pseudo-gradients for directive optimization.

Specifically, wedevelop a series of segmentation policies that ensure the images generated aresemantically consistent with the target prompt.

Secondly, after segmentation,to overcome the challenge of the inseparability of minimum unsafe words, wepropose recursion, a strategy that makes minimum unsafe words subdivisible.

Collectively, segmentation and recursion ensure that all the request promptsare benign but can lead to malicious outcomes.

We conduct experiments on thereal-world text-to-image generation system ($i.

e.

$, DALL$\cdot$E 3) to validatethe effectiveness of Inception.

The results indicate that Inception surpassesthe state-of-the-art by a 14\% margin in attack success rate.

Published on arXiv on: 2025-04-29T02:40:36Z