RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search

Link: http://arxiv.org/abs/2504.15047v1

PDF Link: http://arxiv.org/pdf/2504.15047v1

Summary: Large Language Models (LLMs) exhibit remarkable capabilities but aresusceptible to adversarial prompts that exploit vulnerabilities to produceunsafe or biased outputs.

Existing red-teaming methods often face scalabilitychallenges, resource-intensive requirements, or limited diversity in attackstrategies.

We propose RainbowPlus, a novel red-teaming framework rooted inevolutionary computation, enhancing adversarial prompt generation through anadaptive quality-diversity (QD) search that extends classical evolutionaryalgorithms like MAP-Elites with innovations tailored for language models.

Byemploying a multi-element archive to store diverse high-quality prompts and acomprehensive fitness function to evaluate multiple prompts concurrently,RainbowPlus overcomes the constraints of single-prompt archives and pairwisecomparisons in prior QD methods like Rainbow Teaming.

Experiments comparingRainbowPlus to QD methods across six benchmark datasets and four open-sourceLLMs demonstrate superior attack success rate (ASR) and diversity(Diverse-Score $\approx 0.

84$), generating up to 100 times more unique prompts(e.

g.

, 10,418 vs.

100 for Ministral-8B-Instruct-2410).

Against ninestate-of-the-art methods on the HarmBench dataset with twelve LLMs (tenopen-source, two closed-source), RainbowPlus achieves an average ASR of 81.

1%,surpassing AutoDAN-Turbo by 3.

9%, and is 9 times faster (1.

45 vs.

13.

50 hours).

Our open-source implementation fosters further advancements in LLM safety,offering a scalable tool for vulnerability assessment.

Code and resources arepublicly available at https://github.

com/knoveleng/rainbowplus, supportingreproducibility and future research in LLM red-teaming.

Published on arXiv on: 2025-04-21T12:04:57Z