Jan 7, 2025 • 1 min read LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models arxiv papers
Jan 7, 2025 • 1 min read Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs arxiv papers
Jan 7, 2025 • 1 min read CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models arxiv papers
Jan 7, 2025 • 1 min read Spot Risks Before Speaking! Unraveling Safety Attention Heads in Large Vision-Language Models arxiv papers
Jan 7, 2025 • 1 min read Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models arxiv papers
Jan 7, 2025 • 1 min read Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions arxiv papers
Jan 7, 2025 • 1 min read Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense arxiv papers
Jan 2, 2025 • 1 min read Divide and Conquer: A Hybrid Strategy Defeats Multimodal Large Language Models arxiv papers
Jan 2, 2025 • 1 min read Shaping the Safety Boundaries: Understanding and Defending Against Jailbreaks in Large Language Models arxiv papers
Jan 2, 2025 • 1 min read Robustness of Large Language Models Against Adversarial Attacks arxiv papers