gptleaks (Page 3)

May 28, 2025 • 1 min read

Improved Representation Steering for Language Models

arxiv papers

May 28, 2025 • 1 min read

Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space

arxiv papers

May 28, 2025 • 1 min read

Concealment of Intent: A Game-Theoretic Analysis

arxiv papers

May 27, 2025 • 1 min read

VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models

arxiv papers

May 27, 2025 • 1 min read

JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models

arxiv papers

May 27, 2025 • 1 min read

What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs

arxiv papers

May 27, 2025 • 1 min read

SGM: A Framework for Building Specification-Guided Moderation Filters

arxiv papers

May 27, 2025 • 1 min read

Attention! You Vision Language Model Could Be Maliciously Manipulated

arxiv papers

May 26, 2025 • 7 min read

AI Evolution: 2025 Industry Implications and Ethical Considerations

weekly news about ai

May 26, 2025 • 7 min read

Enhancing Security in Large Language Models: Risks and Best Practices

weekly news about llm security

May 23, 2025 • 1 min read

When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques

arxiv papers

May 23, 2025 • 1 min read

Finetuning-Activated Backdoors in LLMs

arxiv papers