Nov 5, 2025 • 1 min read AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models arxiv papers
Sep 11, 2025 • 1 min read X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates arxiv papers
Sep 10, 2025 • 1 min read Transferable Direct Prompt Injection via Activation-Guided MCMC Sampling arxiv papers
Sep 10, 2025 • 1 min read ImportSnare: Directed "Code Manual" Hijacking in Retrieval-Augmented Code Generation arxiv papers
Sep 9, 2025 • 1 min read Mask-GCG: Are All Tokens in Adversarial Suffixes Necessary for Jailbreak Attacks? arxiv papers
Sep 4, 2025 • 1 min read SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models arxiv papers
Aug 29, 2025 • 1 min read Towards Mechanistic Defenses Against Typographic Attacks in CLIP arxiv papers
Aug 29, 2025 • 1 min read GUARD: Guideline Upholding Test through Adaptive Role-play and Jailbreak Diagnostics for LLMs arxiv papers
Aug 29, 2025 • 1 min read JADES: A Universal Framework for Jailbreak Assessment via Decompositional Scoring arxiv papers
Aug 29, 2025 • 1 min read Publish to Perish: Prompt Injection Attacks on LLM-Assisted Peer Review arxiv papers
Aug 28, 2025 • 1 min read Safety Alignment Should Be Made More Than Just A Few Attention Heads arxiv papers