Skip to content
arxiv papers 1 min read

Foot-In-The-Door: A Multi-turn Jailbreak for LLMs

Link: http://arxiv.org/abs/2502.19820v1

PDF Link: http://arxiv.org/pdf/2502.19820v1

Summary: Ensuring AI safety is crucial as large language models become increasinglyintegrated into real-world applications.

A key challenge is jailbreak, whereadversarial prompts bypass built-in safeguards to elicit harmful disallowedoutputs.

Inspired by psychological foot-in-the-door principles, we introduceFITD,a novel multi-turn jailbreak method that leverages the phenomenon whereminor initial commitments lower resistance to more significant or moreunethical transgressions.

Our approach progressively escalates the maliciousintent of user queries through intermediate bridge prompts and aligns themodel's response by itself to induce toxic responses.

Extensive experimentalresults on two jailbreak benchmarks demonstrate that FITD achieves an averageattack success rate of 94% across seven widely used models, outperformingexisting state-of-the-art methods.

Additionally, we provide an in-depthanalysis of LLM self-corruption, highlighting vulnerabilities in currentalignment strategies and emphasizing the risks inherent in multi-turninteractions.

The code is available athttps://github.

com/Jinxiaolong1129/Foot-in-the-door-Jailbreak .

Published on arXiv on: 2025-02-27T06:49:16Z