Link: http://arxiv.org/abs/2507.21985v1
PDF Link: http://arxiv.org/pdf/2507.21985v1
Summary: Machine unlearning (MU) removes specific data points or concepts from deeplearning models to enhance privacy and prevent sensitive content generation.
Adversarial prompts can exploit unlearned models to generate content containingremoved concepts, posing a significant security risk.
However, existingadversarial attack methods still face challenges in generating content thataligns with an attacker's intent while incurring high computational costs toidentify successful prompts.
To address these challenges, we propose ZIUM, aZero-shot Intent-aware adversarial attack on Unlearned Models, which enablesthe flexible customization of target attack images to reflect an attacker'sintent.
Additionally, ZIUM supports zero-shot adversarial attacks withoutrequiring further optimization for previously attacked unlearned concepts.
Theevaluation across various MU scenarios demonstrated ZIUM's effectiveness insuccessfully customizing content based on user-intent prompts while achieving asuperior attack success rate compared to existing methods.
Moreover, itszero-shot adversarial attack significantly reduces the attack time forpreviously attacked unlearned concepts.
Published on arXiv on: 2025-07-29T16:36:01Z