Mar 18, 2025 • 1 min read MirrorGuard: Adaptive Defense Against Jailbreaks via Entropy-Guided Mirror Crafting arxiv papers
Mar 14, 2025 • 1 min read Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search arxiv papers
Mar 13, 2025 • 1 min read JBFuzz: Jailbreaking LLMs Efficiently and Effectively Using Fuzzing arxiv papers
Mar 13, 2025 • 1 min read Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States arxiv papers
Mar 13, 2025 • 1 min read Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models arxiv papers
Mar 12, 2025 • 1 min read Dialogue Injection Attack: Jailbreaking LLMs through Context Manipulation arxiv papers
Mar 11, 2025 • 1 min read Utilizing Jailbreak Probability to Attack and Safeguard Multimodal LLMs arxiv papers
Mar 11, 2025 • 1 min read TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models arxiv papers
Mar 6, 2025 • 1 min read Improving LLM Safety Alignment with Dual-Objective Optimization arxiv papers
Feb 28, 2025 • 1 min read Developmental Support Approach to AI's Autonomous Growth: Toward the Realization of a Mutually Beneficial Stage Through Experiential Learning arxiv papers