Feb 19, 2025 • 1 min read Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking arxiv papers
Feb 19, 2025 • 1 min read Understanding and Rectifying Safety Perception Distortion in VLMs arxiv papers
Feb 18, 2025 • 1 min read CCJA: Context-Coherent Jailbreak Attack for Aligned Large Language Models arxiv papers
Feb 18, 2025 • 1 min read Detecting and Filtering Unsafe Training Data via Data Attribution arxiv papers
Feb 18, 2025 • 1 min read Adversary-Aware DPO: Enhancing Safety Alignment in Vision Language Models via Adversarial Training arxiv papers
Feb 18, 2025 • 1 min read DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing arxiv papers
Feb 17, 2025 • 3 min read AI Advancements and Geopolitical Competition: A Comprehensive Analysis weekly news about ai
Feb 17, 2025 • 2 min read Enhancing Security in Large Language Models: Addressing Vulnerabilities and Innovative Solutions weekly news about llm security
Feb 12, 2025 • 1 min read OpenGrok: Enhancing SNS Data Processing with Distilled Knowledge and Mask-like Mechanisms arxiv papers
Feb 12, 2025 • 1 min read JBShield: Defending Large Language Models from Jailbreak Attacks through Activated Concept Analysis and Manipulation arxiv papers
Feb 11, 2025 • 1 min read When Data Manipulation Meets Attack Goals: An In-depth Survey of Attacks for VLMs arxiv papers