Feb 25, 2025 • 1 min read REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective arxiv papers
Feb 24, 2025 • 7 min read Addressing Security Challenges in Large Language Models weekly news about llm security
Feb 21, 2025 • 1 min read How Jailbreak Defenses Work and Ensemble? A Mechanistic Investigation arxiv papers
Feb 21, 2025 • 1 min read HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States arxiv papers
Feb 20, 2025 • 1 min read Exploiting Prefix-Tree in Structured Output Interfaces for Enhancing Jailbreak Attacking arxiv papers
Feb 20, 2025 • 1 min read Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region arxiv papers
Feb 19, 2025 • 1 min read Computational Safety for Generative AI: A Signal Processing Perspective arxiv papers
Feb 19, 2025 • 1 min read SoK: Understanding Vulnerabilities in the Large Language Model Supply Chain arxiv papers
Feb 19, 2025 • 1 min read The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 arxiv papers
Feb 19, 2025 • 1 min read H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking arxiv papers