Apr 11, 2025 • 1 min read Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge arxiv papers
Apr 9, 2025 • 1 min read Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking arxiv papers
Apr 9, 2025 • 1 min read Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking arxiv papers
Apr 8, 2025 • 1 min read Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models arxiv papers
Apr 8, 2025 • 1 min read Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models arxiv papers
Apr 8, 2025 • 1 min read A Domain-Based Taxonomy of Jailbreak Vulnerabilities in Large Language Models arxiv papers
Apr 7, 2025 • 5 min read Exploring the AI Landscape: Recent Developments, Innovations, and Ethical Considerations weekly news about ai
Apr 7, 2025 • 4 min read Enhancing Large Language Model Security: Challenges and Solutions Analysis. weekly news about llm security
Apr 4, 2025 • 1 min read More is Less: The Pitfalls of Multi-Model Synthetic Preference Data in DPO Safety Alignment arxiv papers
Apr 4, 2025 • 1 min read LLMs as Deceptive Agents: How Role-Based Prompting Induces Semantic Ambiguity in Puzzle Tasks arxiv papers
Apr 3, 2025 • 1 min read Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning arxiv papers