gptleaks (Page 24)

Feb 25, 2025 • 1 min read

REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective

arxiv papers

Feb 24, 2025 • 3 min read

The Impact of AI Advancements on Industries in 2025

weekly news about ai

Feb 24, 2025 • 7 min read

Addressing Security Challenges in Large Language Models

weekly news about llm security

Feb 21, 2025 • 1 min read

How Jailbreak Defenses Work and Ensemble? A Mechanistic Investigation

arxiv papers

Feb 21, 2025 • 1 min read

HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States

arxiv papers

Feb 20, 2025 • 1 min read

Exploiting Prefix-Tree in Structured Output Interfaces for Enhancing Jailbreak Attacking

arxiv papers

Feb 20, 2025 • 1 min read

Efficient Safety Retrofitting Against Jailbreaking for LLMs

arxiv papers

Feb 20, 2025 • 1 min read

Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region

arxiv papers

Feb 19, 2025 • 1 min read

Computational Safety for Generative AI: A Signal Processing Perspective

arxiv papers

Feb 19, 2025 • 1 min read

SoK: Understanding Vulnerabilities in the Large Language Model Supply Chain

arxiv papers

Feb 19, 2025 • 1 min read

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1

arxiv papers

Feb 19, 2025 • 1 min read

H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking

arxiv papers