Aug 19, 2025 • 1 min read FuSaR: A Fuzzification-Based Method for LRM Safety-Reasoning Balance arxiv papers
Aug 15, 2025 • 1 min read Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts arxiv papers
Aug 15, 2025 • 1 min read Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation arxiv papers
Aug 13, 2025 • 1 min read Securing Educational LLMs: A Generalised Taxonomy of Attacks on LLMs and DREAD Risk Assessment arxiv papers
Jul 30, 2025 • 1 min read PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking arxiv papers
Jul 30, 2025 • 1 min read ZIUM: Zero-Shot Intent-Aware Adversarial Attack on Unlearned Models arxiv papers
Jul 30, 2025 • 1 min read Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security arxiv papers
Jul 17, 2025 • 1 min read Exploiting Jailbreaking Vulnerabilities in Generative AI to Bypass Ethical Safeguards for Facilitating Phishing Attacks arxiv papers
Jul 16, 2025 • 1 min read The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs arxiv papers