Apr 3, 2025 • 1 min read Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks arxiv papers
Apr 3, 2025 • 1 min read PiCo: Jailbreaking Multimodal Large Language Models via $\textbf{Pi}$ctorial $\textbf{Co}$de Contextualization arxiv papers
Apr 3, 2025 • 1 min read LightDefense: A Lightweight Uncertainty-Driven Defense against Jailbreaks via Shifted Token Distribution arxiv papers
Apr 1, 2025 • 1 min read Output Constraints as Attack Surface: Exploiting Structured Generation to Bypass LLM Safety Mechanisms arxiv papers
Mar 28, 2025 • 1 min read Harnessing Chain-of-Thought Metadata for Task Routing and Adversarial Prompt Detection arxiv papers
Mar 28, 2025 • 1 min read Prompt, Divide, and Conquer: Bypassing Large Language Model Safety Filters via Segmented and Distributed Prompt Processing arxiv papers
Mar 27, 2025 • 1 min read Iterative Prompting with Persuasion Skills in Jailbreaking Large Language Models arxiv papers
Mar 19, 2025 • 1 min read Make the Most of Everything: Further Considerations on Disrupting Diffusion-based Customization arxiv papers
Mar 18, 2025 • 1 min read Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models arxiv papers