May 23, 2025 • 1 min read Implicit Jailbreak Attacks via Cross-Modal Information Concealment on Vision-Language Models arxiv papers
May 23, 2025 • 1 min read When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques arxiv papers
May 23, 2025 • 1 min read Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models arxiv papers
May 23, 2025 • 1 min read MixAT: Combining Continuous and Discrete Adversarial Training for LLMs arxiv papers
May 22, 2025 • 1 min read Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries arxiv papers
May 22, 2025 • 1 min read Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval arxiv papers
May 22, 2025 • 1 min read Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses arxiv papers
May 22, 2025 • 1 min read Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models arxiv papers
May 22, 2025 • 1 min read Advancing LLM Safe Alignment with Safety Representation Ranking arxiv papers
May 21, 2025 • 1 min read "Haet Bhasha aur Diskrimineshun": Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs arxiv papers