Link: http://arxiv.org/abs/2505.17013v1
PDF Link: http://arxiv.org/pdf/2505.17013v1
Summary: Concept erasure, the ability to selectively prevent a model from generatingspecific concepts, has attracted growing interest, with various approachesemerging to address the challenge.
However, it remains unclear how thoroughlythese methods erase the target concept.
We begin by proposing two conceptualmodels for the erasure mechanism in diffusion models: (i) reducing thelikelihood of generating the target concept, and (ii) interfering with themodel's internal guidance mechanisms.
To thoroughly assess whether a concepthas been truly erased from the model, we introduce a suite of independentevaluations.
Our evaluation framework includes adversarial attacks, novelprobing techniques, and analysis of the model's alternative generations inplace of the erased concept.
Our results shed light on the tension betweenminimizing side effects and maintaining robustness to adversarial prompts.
Broadly, our work underlines the importance of comprehensive evaluation forerasure in diffusion models.
Published on arXiv on: 2025-05-22T17:59:09Z