Token-Level Constraint Boundary Search for Jailbreaking Text-to-Image Models

Link: http://arxiv.org/abs/2504.11106v1

PDF Link: http://arxiv.org/pdf/2504.11106v1

Summary: Recent advancements in Text-to-Image (T2I) generation have significantlyenhanced the realism and creativity of generated images.

However, such powerfulgenerative capabilities pose risks related to the production of inappropriateor harmful content.

Existing defense mechanisms, including prompt checkers andpost-hoc image checkers, are vulnerable to sophisticated adversarial attacks.

In this work, we propose TCBS-Attack, a novel query-based black-box jailbreakattack that searches for tokens located near the decision boundaries defined bytext and image checkers.

By iteratively optimizing tokens near theseboundaries, TCBS-Attack generates semantically coherent adversarial promptscapable of bypassing multiple defensive layers in T2I models.

Extensiveexperiments demonstrate that our method consistently outperformsstate-of-the-art jailbreak attacks across various T2I models, includingsecurely trained open-source models and commercial online services like DALL-E3.

TCBS-Attack achieves an ASR-4 of 45\% and an ASR-1 of 21\% on jailbreakingfull-chain T2I models, significantly surpassing baseline methods.

Published on arXiv on: 2025-04-15T11:53:40Z