Jan 2, 2025 • 1 min read DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak arxiv papers
Jan 2, 2025 • 1 min read Retention Score: Quantifying Jailbreak Risks for Vision Language Models arxiv papers
Jan 2, 2025 • 1 min read AEIOU: A Unified Defense Framework against NSFW Prompts in Text-to-Image Models arxiv papers
Jan 2, 2025 • 1 min read Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models arxiv papers
Dec 19, 2024 • 1 min read Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation arxiv papers
Dec 12, 2024 • 1 min read Evil twins are not that evil: Qualitative insights into machine-generated prompts arxiv papers
Dec 12, 2024 • 1 min read Model-Editing-Based Jailbreak against Safety-aligned Large Language Models arxiv papers
Dec 12, 2024 • 1 min read AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models arxiv papers