gptleaks

Aug 28, 2025 • 1 min read

Evaluating Language Model Reasoning about Confidential Information

arxiv papers

Aug 28, 2025 • 1 min read

Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks

arxiv papers

Aug 26, 2025 • 1 min read

Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models

arxiv papers

Aug 26, 2025 • 1 min read

Speculative Safety-Aware Decoding

arxiv papers

Aug 25, 2025 • 1 min read

Exploring the Evolution of Artificial Intelligence: Trends, Innovations, and Challenges

weekly news about ai

Aug 25, 2025 • 5 min read

Securing Large Language Models: Understanding Vulnerabilities and Mitigation Strategies

weekly news about llm security

Aug 22, 2025 • 1 min read

SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks

arxiv papers

Aug 22, 2025 • 1 min read

Retrieval-Augmented Review Generation for Poisoning Recommender Systems

arxiv papers

Aug 22, 2025 • 1 min read

Adversarial Attacks against Neural Ranking Models via In-Context Learning

arxiv papers

Aug 22, 2025 • 1 min read

SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models

arxiv papers

Aug 21, 2025 • 1 min read

Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent

arxiv papers

Aug 20, 2025 • 1 min read

Sycophancy under Pressure: Evaluating and Mitigating Sycophantic Bias via Adversarial Dialogues in Scientific QA

arxiv papers