gptleaks (Page 8)

Aug 19, 2025 • 1 min read

CorrSteer: Steering Improves Task Performance and Safety in LLMs through Correlation-based Sparse Autoencoder Feature Selection

arxiv papers

Aug 19, 2025 • 1 min read

MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies

arxiv papers

Aug 19, 2025 • 1 min read

FuSaR: A Fuzzification-Based Method for LRM Safety-Reasoning Balance

arxiv papers

Aug 18, 2025 • 3 min read

The Impact of AI: Revolutionizing Industries with Ethical Considerations and Regulation

weekly news about ai

Aug 18, 2025 • 1 min read

Ensuring Security in Large Language Models: Challenges, Best Practices, and Future Trends

weekly news about llm security

Aug 15, 2025 • 1 min read

Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts

arxiv papers

Aug 15, 2025 • 1 min read

Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation

arxiv papers

Aug 13, 2025 • 1 min read

Securing Educational LLMs: A Generalised Taxonomy of Attacks on LLMs and DREAD Risk Assessment

arxiv papers

Aug 12, 2025 • 1 min read

Multi-Turn Jailbreaks Are Simpler Than They Seem

arxiv papers

Jul 30, 2025 • 1 min read

PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking

arxiv papers

Jul 30, 2025 • 1 min read

Anyone Can Jailbreak: Prompt-Based Attacks on LLMs and T2Is

arxiv papers

Jul 30, 2025 • 1 min read

ZIUM: Zero-Shot Intent-Aware Adversarial Attack on Unlearned Models

arxiv papers