Dec 12, 2024 • 1 min read Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models arxiv papers
Dec 11, 2024 • 1 min read PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips arxiv papers
Dec 11, 2024 • 1 min read FlexLLM: Exploring LLM Customization for Moving Target Defense on Black-Box LLMs Against Jailbreak Attacks arxiv papers
Dec 4, 2024 • 1 min read Does Safety Training of LLMs Generalize to Semantically Related Natural Prompts? arxiv papers
Dec 3, 2024 • 1 min read Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach arxiv papers
Dec 2, 2024 • 1 min read Improved Large Language Model Jailbreak Detection via Pretrained Embeddings arxiv papers
Dec 1, 2024 • 1 min read SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts arxiv papers
Nov 30, 2024 • 1 min read Jailbreak Large Vision-Language Models Through Multi-Modal Linkage arxiv papers