BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage

Link: http://arxiv.org/abs/2506.02479v1

PDF Link: http://arxiv.org/pdf/2506.02479v1

Summary: The inherent risk of generating harmful and unsafe content by Large LanguageModels (LLMs), has highlighted the need for their safety alignment.

Varioustechniques like supervised fine-tuning, reinforcement learning from humanfeedback, and red-teaming were developed for ensuring the safety alignment ofLLMs.

However, the robustness of these aligned LLMs is always challenged byadversarial attacks that exploit unexplored and underlying vulnerabilities ofthe safety alignment.

In this paper, we develop a novel black-box jailbreakattack, called BitBypass, that leverages hyphen-separated bitstream camouflagefor jailbreaking aligned LLMs.

This represents a new direction in jailbreakingby exploiting fundamental information representation of data as continuousbits, rather than leveraging prompt engineering or adversarial manipulations.

Our evaluation of five state-of-the-art LLMs, namely GPT-4o, Gemini 1.

5, Claude3.

5, Llama 3.

1, and Mixtral, in adversarial perspective, revealed thecapabilities of BitBypass in bypassing their safety alignment and tricking theminto generating harmful and unsafe content.

Further, we observed that BitBypassoutperforms several state-of-the-art jailbreak attacks in terms of stealthinessand attack success.

Overall, these results highlights the effectiveness andefficiency of BitBypass in jailbreaking these state-of-the-art LLMs.

Published on arXiv on: 2025-06-03T05:51:18Z