gptleaks (Page 27)

Jan 30, 2025 • 1 min read

RICoTA: Red-teaming of In-the-wild Conversation with Test Attempts

arxiv papers

Jan 29, 2025 • 1 min read

xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking

arxiv papers

Jan 24, 2025 • 1 min read

Tune In, Act Up: Exploring the Impact of Audio Modality-Specific Edits on Large Audio Language Models in Jailbreak

arxiv papers

Jan 22, 2025 • 1 min read

CogMorph: Cognitive Morphing Attacks for Text-to-Image Models

arxiv papers

Jan 22, 2025 • 1 min read

You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense

arxiv papers

Jan 17, 2025 • 1 min read

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

arxiv papers

Jan 16, 2025 • 1 min read

SAIF: A Comprehensive Framework for Evaluating the Risks of Generative AI in the Public Sector

arxiv papers

Jan 15, 2025 • 1 min read

Self-Instruct Few-Shot Jailbreaking: Decompose the Attack into Pattern and Behavior Learning

arxiv papers

Jan 10, 2025 • 1 min read

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

arxiv papers

Jan 7, 2025 • 1 min read

AEIOU: A Unified Defense Framework against NSFW Prompts in Text-to-Image Models

arxiv papers

Jan 7, 2025 • 1 min read

Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models

arxiv papers

Jan 7, 2025 • 1 min read

InfAlign: Inference-aware language model alignment

arxiv papers