gptleaks (Page 4)

May 23, 2025 • 1 min read

Implicit Jailbreak Attacks via Cross-Modal Information Concealment on Vision-Language Models

arxiv papers

May 23, 2025 • 1 min read

When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques

arxiv papers

May 23, 2025 • 1 min read

Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models

arxiv papers

May 23, 2025 • 1 min read

Finetuning-Activated Backdoors in LLMs

arxiv papers

May 23, 2025 • 1 min read

When Are Concepts Erased From Diffusion Models?

arxiv papers

May 23, 2025 • 1 min read

MixAT: Combining Continuous and Discrete Adversarial Training for LLMs

arxiv papers

May 22, 2025 • 1 min read

Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries

arxiv papers

May 22, 2025 • 1 min read

Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval

arxiv papers

May 22, 2025 • 1 min read

Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses

arxiv papers

May 22, 2025 • 1 min read

Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models

arxiv papers

May 22, 2025 • 1 min read

Advancing LLM Safe Alignment with Safety Representation Ranking

arxiv papers

May 21, 2025 • 1 min read

"Haet Bhasha aur Diskrimineshun": Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs

arxiv papers