Fooling the Watchers: Breaking AIGC Detectors via Semantic Prompt Attacks

Link: http://arxiv.org/abs/2505.23192v1

PDF Link: http://arxiv.org/pdf/2505.23192v1

Summary: The rise of text-to-image (T2I) models has enabled the synthesis ofphotorealistic human portraits, raising serious concerns about identity misuseand the robustness of AIGC detectors.

In this work, we propose an automatedadversarial prompt generation framework that leverages a grammar tree structureand a variant of the Monte Carlo tree search algorithm to systematicallyexplore the semantic prompt space.

Our method generates diverse, controllableprompts that consistently evade both open-source and commercial AIGC detectors.

Extensive experiments across multiple T2I models validate its effectiveness,and the approach ranked first in a real-world adversarial AIGC detectioncompetition.

Beyond attack scenarios, our method can also be used to constructhigh-quality adversarial datasets, providing valuable resources for trainingand evaluating more robust AIGC detection and defense systems.

Published on arXiv on: 2025-05-29T07:31:17Z