Dec 12, 2025 • 1 min read How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation arxiv papers
Dec 12, 2025 • 1 min read When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection arxiv papers
Dec 11, 2025 • 1 min read Black-Box Behavioral Distillation Breaks Safety Alignment in Medical LLMs arxiv papers
Dec 11, 2025 • 1 min read CNFinBench: A Benchmark for Safety and Compliance of Large Language Models in Finance arxiv papers
Dec 10, 2025 • 1 min read Universal Adversarial Suffixes Using Calibrated Gumbel-Softmax Relaxation arxiv papers
Dec 10, 2025 • 1 min read A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties arxiv papers
Dec 9, 2025 • 1 min read ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking arxiv papers
Dec 9, 2025 • 1 min read Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models arxiv papers
Dec 9, 2025 • 1 min read RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models arxiv papers
Dec 5, 2025 • 1 min read Malicious Image Analysis via Vision-Language Segmentation Fusion: Detection, Element, and Location in One-shot arxiv papers
Dec 5, 2025 • 1 min read SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security arxiv papers