Aug 19, 2025 • 1 min read CorrSteer: Steering Improves Task Performance and Safety in LLMs through Correlation-based Sparse Autoencoder Feature Selection arxiv papers
Aug 19, 2025 • 1 min read MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies arxiv papers
Aug 19, 2025 • 1 min read FuSaR: A Fuzzification-Based Method for LRM Safety-Reasoning Balance arxiv papers
Aug 18, 2025 • 3 min read The Impact of AI: Revolutionizing Industries with Ethical Considerations and Regulation weekly news about ai
Aug 18, 2025 • 1 min read Ensuring Security in Large Language Models: Challenges, Best Practices, and Future Trends weekly news about llm security
Aug 15, 2025 • 1 min read Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts arxiv papers
Aug 15, 2025 • 1 min read Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation arxiv papers
Aug 13, 2025 • 1 min read Securing Educational LLMs: A Generalised Taxonomy of Attacks on LLMs and DREAD Risk Assessment arxiv papers
Jul 30, 2025 • 1 min read PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking arxiv papers
Jul 30, 2025 • 1 min read ZIUM: Zero-Shot Intent-Aware Adversarial Attack on Unlearned Models arxiv papers