Jul 9, 2025 • 1 min read The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation arxiv papers
Jul 9, 2025 • 1 min read TuneShield: Mitigating Toxicity in Conversational AI while Fine-tuning on Untrusted Data arxiv papers
Jul 9, 2025 • 1 min read CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations arxiv papers
Jul 8, 2025 • 1 min read Trojan Horse Prompting: Jailbreaking Conversational Multimodal Models by Forging Assistant Message arxiv papers
Jul 8, 2025 • 1 min read Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models arxiv papers
Jul 7, 2025 • 3 min read The Evolving Landscape of AI in 2025: Trends, Challenges, and Future Prospects. weekly news about ai
Jul 7, 2025 • 2 min read Securing Large Language Models: Addressing Vulnerabilities, Ethical Concerns, and Future Trends weekly news about llm security
Jul 4, 2025 • 1 min read PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage arxiv papers
Jul 4, 2025 • 1 min read Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models arxiv papers
Jul 4, 2025 • 1 min read Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection arxiv papers
Jul 3, 2025 • 1 min read SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism arxiv papers
Jul 1, 2025 • 1 min read Evaluating Multi-Agent Defences Against Jailbreaking Attacks on Large Language Models arxiv papers