May 30, 2025 • 1 min read Adaptive Jailbreaking Strategies Based on the Semantic Understanding Capabilities of Large Language Models arxiv papers
May 30, 2025 • 1 min read Fooling the Watchers: Breaking AIGC Detectors via Semantic Prompt Attacks arxiv papers
May 29, 2025 • 1 min read Adaptive Detoxification: Safeguarding General Capabilities of LLMs through Toxicity-Aware Knowledge Editing arxiv papers
May 29, 2025 • 1 min read Test-Time Immunization: A Universal Defense Framework Against Jailbreaks for (Multimodal) Large Language Models arxiv papers
May 29, 2025 • 1 min read Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack arxiv papers
May 28, 2025 • 1 min read Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space arxiv papers
May 27, 2025 • 1 min read Attention! You Vision Language Model Could Be Maliciously Manipulated arxiv papers
May 27, 2025 • 1 min read What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs arxiv papers
May 27, 2025 • 1 min read SGM: A Framework for Building Specification-Guided Moderation Filters arxiv papers