arxiv papers

Dec 12, 2025 • 1 min read

How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation

arxiv papers

Dec 12, 2025 • 1 min read

When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt Injection

arxiv papers

Dec 11, 2025 • 1 min read

Black-Box Behavioral Distillation Breaks Safety Alignment in Medical LLMs

arxiv papers

Dec 11, 2025 • 1 min read

CNFinBench: A Benchmark for Safety and Compliance of Large Language Models in Finance

arxiv papers

Dec 10, 2025 • 1 min read

Universal Adversarial Suffixes Using Calibrated Gumbel-Softmax Relaxation

arxiv papers

Dec 10, 2025 • 1 min read

Robust Agents in Open-Ended Worlds

arxiv papers

Dec 10, 2025 • 1 min read

A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties

arxiv papers

Dec 9, 2025 • 1 min read

ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking

arxiv papers

Dec 9, 2025 • 1 min read

Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models

arxiv papers

Dec 9, 2025 • 1 min read

RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models

arxiv papers

Dec 5, 2025 • 1 min read

Malicious Image Analysis via Vision-Language Segmentation Fusion: Detection, Element, and Location in One-shot

arxiv papers

Dec 5, 2025 • 1 min read

SoK: a Comprehensive Causality Analysis Framework for Large Language Model Security

arxiv papers