Apr 3, 2025 • 1 min read Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks arxiv papers
Apr 3, 2025 • 1 min read Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning arxiv papers
Apr 3, 2025 • 1 min read LightDefense: A Lightweight Uncertainty-Driven Defense against Jailbreaks via Shifted Token Distribution arxiv papers
Apr 1, 2025 • 1 min read Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms arxiv papers
Mar 31, 2025 • 5 min read Securing Large Language Models: Exploring Vulnerabilities and Mitigation Strategies weekly news about llm security
Mar 28, 2025 • 1 min read Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing arxiv papers
Mar 28, 2025 • 1 min read Harnessing Chain-of-Thought Metadata for Task Routing and Adversarial Prompt Detection arxiv papers
Mar 27, 2025 • 1 min read Iterative Prompting with Persuasion Skills in Jailbreaking Large Language Models arxiv papers