arxiv papers

May 27, 2025 • 1 min read

What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs

arxiv papers

May 27, 2025 • 1 min read

Attention! You Vision Language Model Could Be Maliciously Manipulated

arxiv papers

May 23, 2025 • 1 min read

SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning

arxiv papers

May 23, 2025 • 1 min read

Three Minds, One Legend: Jailbreak Large Reasoning Model with Adaptive Stacked Ciphers

arxiv papers

May 23, 2025 • 1 min read

Implicit Jailbreak Attacks via Cross-Modal Information Concealment on Vision-Language Models

arxiv papers

May 23, 2025 • 1 min read

Finetuning-Activated Backdoors in LLMs

arxiv papers

May 23, 2025 • 1 min read

When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques

arxiv papers

May 23, 2025 • 1 min read

Invisible Prompts, Visible Threats: Malicious Font Injection in External Resources for Large Language Models

arxiv papers

May 23, 2025 • 1 min read

MixAT: Combining Continuous and Discrete Adversarial Training for LLMs

arxiv papers

May 23, 2025 • 1 min read

When Are Concepts Erased From Diffusion Models?

arxiv papers

May 22, 2025 • 1 min read

Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries

arxiv papers

May 22, 2025 • 1 min read

Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models

arxiv papers