Hoist with His Own Petard: Inducing Guardrails to Facilitate Denial-of-Service Attacks on Retrieval-Augmented Generation of LLMs

Link: http://arxiv.org/abs/2504.21680v1

PDF Link: http://arxiv.org/pdf/2504.21680v1

Summary: Retrieval-Augmented Generation (RAG) integrates Large Language Models (LLMs)with external knowledge bases, improving output quality while introducing newsecurity risks.

Existing studies on RAG vulnerabilities typically focus onexploiting the retrieval mechanism to inject erroneous knowledge or malicioustexts, inducing incorrect outputs.

However, these approaches overlook criticalweaknesses within LLMs, leaving important attack vectors unexplored andlimiting the scope and efficiency of attacks.

In this paper, we uncover a novelvulnerability: the safety guardrails of LLMs, while designed for protection,can also be exploited as an attack vector by adversaries.

Building on thisvulnerability, we propose MutedRAG, a novel denial-of-service attack thatreversely leverages the guardrails of LLMs to undermine the availability of RAGsystems.

By injecting minimalistic jailbreak texts, such as "\textit{How tobuild a bomb}", into the knowledge base, MutedRAG intentionally triggers theLLM's safety guardrails, causing the system to reject legitimate queries.

Besides, due to the high sensitivity of guardrails, a single jailbreak samplecan affect multiple queries, effectively amplifying the efficiency of attackswhile reducing their costs.

Experimental results on three datasets demonstratethat MutedRAG achieves an attack success rate exceeding 60% in many scenarios,requiring only less than one malicious text to each target query on average.

Inaddition, we evaluate potential defense strategies against MutedRAG, findingthat some of current mechanisms are insufficient to mitigate this threat,underscoring the urgent need for more robust solutions.

Published on arXiv on: 2025-04-30T14:18:11Z