Nov 30, 2024 • 1 min read Safety Alignment Backfires: Preventing the Re-emergence of Suppressed Concepts in Fine-tuned Text-to-Image Diffusion Models arxiv papers
Nov 28, 2024 • 1 min read PEFT-as-an-Attack! Jailbreaking Language Models during Federated Parameter-Efficient Fine-Tuning arxiv papers
Nov 28, 2024 • 1 min read DIESEL -- Dynamic Inference-Guidance via Evasion of Semantic Embeddings in LLMs arxiv papers