May 22, 2025 • 1 min read Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models arxiv papers
May 21, 2025 • 1 min read Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders arxiv papers
May 21, 2025 • 1 min read "Haet Bhasha aur Diskrimineshun": Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs arxiv papers
May 21, 2025 • 1 min read PandaGuard: Systematic Evaluation of LLM Safety in the Era of Jailbreaking Attacks arxiv papers
May 21, 2025 • 1 min read SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment arxiv papers
May 21, 2025 • 1 min read Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion arxiv papers
May 21, 2025 • 1 min read AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models arxiv papers
May 20, 2025 • 1 min read I'll believe it when I see it: Images increase misinformation sharing in Vision-Language Models arxiv papers
May 19, 2025 • 5 min read The Evolving Landscape of Artificial Intelligence: Trends, Innovations, and Ethical Considerations weekly news about ai
May 19, 2025 • 3 min read Exploring Large Language Models Security Landscape: Challenges, Vulnerabilities, and Solutions. weekly news about llm security