Skip to content
arxiv papers 1 min read

Antelope: Potent and Concealed Jailbreak Attack Strategy

Link: http://arxiv.org/abs/2412.08156v1

PDF Link: http://arxiv.org/pdf/2412.08156v1

Summary: Due to the remarkable generative potential of diffusion-based models,numerous researches have investigated jailbreak attacks targeting theseframeworks.

A particularly concerning threat within image models is thegeneration of Not-Safe-for-Work (NSFW) content.

Despite the implementation ofsecurity filters, numerous efforts continue to explore ways to circumvent thesesafeguards.

Current attack methodologies primarily encompass adversarial promptengineering or concept obfuscation, yet they frequently suffer from slow searchefficiency, conspicuous attack characteristics and poor alignment with targets.

To overcome these challenges, we propose Antelope, a more robust and covertjailbreak attack strategy designed to expose security vulnerabilities inherentin generative models.

Specifically, Antelope leverages the confusion ofsensitive concepts with similar ones, facilitates searches in the semanticallyadjacent space of these related concepts and aligns them with the targetimagery, thereby generating sensitive images that are consistent with thetarget and capable of evading detection.

Besides, we successfully exploit thetransferability of model-based attacks to penetrate online black-box services.

Experimental evaluations demonstrate that Antelope outperforms existingbaselines across multiple defensive mechanisms, underscoring its efficacy andversatility.

Published on arXiv on: 2024-12-11T07:22:51Z