Feb 28, 2025 • 1 min read Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models arxiv papers
Feb 27, 2025 • 1 min read JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models arxiv papers
Feb 27, 2025 • 1 min read Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs arxiv papers
Feb 25, 2025 • 1 min read REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective arxiv papers
Feb 21, 2025 • 1 min read How Jailbreak Defenses Work and Ensemble? A Mechanistic Investigation arxiv papers
Feb 21, 2025 • 1 min read HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States arxiv papers
Feb 20, 2025 • 1 min read Exploiting Prefix-Tree in Structured Output Interfaces for Enhancing Jailbreak Attacking arxiv papers
Feb 20, 2025 • 1 min read Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region arxiv papers
Feb 19, 2025 • 1 min read Computational Safety for Generative AI: A Signal Processing Perspective arxiv papers
Feb 19, 2025 • 1 min read SoK: Understanding Vulnerabilities in the Large Language Model Supply Chain arxiv papers