Aug 28, 2025 • 1 min read Evaluating Language Model Reasoning about Confidential Information arxiv papers
Aug 28, 2025 • 1 min read Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks arxiv papers
Aug 26, 2025 • 1 min read Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models arxiv papers
Aug 25, 2025 • 1 min read Exploring the Evolution of Artificial Intelligence: Trends, Innovations, and Challenges weekly news about ai
Aug 25, 2025 • 5 min read Securing Large Language Models: Understanding Vulnerabilities and Mitigation Strategies weekly news about llm security
Aug 22, 2025 • 1 min read SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks arxiv papers
Aug 22, 2025 • 1 min read Retrieval-Augmented Review Generation for Poisoning Recommender Systems arxiv papers
Aug 22, 2025 • 1 min read Adversarial Attacks against Neural Ranking Models via In-Context Learning arxiv papers
Aug 22, 2025 • 1 min read SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models arxiv papers
Aug 21, 2025 • 1 min read Universal and Transferable Adversarial Attack on Large Language Models Using Exponentiated Gradient Descent arxiv papers
Aug 20, 2025 • 1 min read Sycophancy under Pressure: Evaluating and Mitigating Sycophantic Bias via Adversarial Dialogues in Scientific QA arxiv papers