Skip to content
arxiv papers 1 min read

Evaluation of LLM Vulnerabilities to Being Misused for Personalized Disinformation Generation

Link: http://arxiv.org/abs/2412.13666v1

PDF Link: http://arxiv.org/pdf/2412.13666v1

Summary: The capabilities of recent large language models (LLMs) to generatehigh-quality content indistinguishable by humans from human-written texts risesmany concerns regarding their misuse.

Previous research has shown that LLMs canbe effectively misused for generating disinformation news articles followingpredefined narratives.

Their capabilities to generate personalized (in variousaspects) content have also been evaluated and mostly found usable.

However, acombination of personalization and disinformation abilities of LLMs has notbeen comprehensively studied yet.

Such a dangerous combination should triggerintegrated safety filters of the LLMs, if there are some.

This study fills thisgap by evaluation of vulnerabilities of recent open and closed LLMs, and theirwillingness to generate personalized disinformation news articles in English.

We further explore whether the LLMs can reliably meta-evaluate thepersonalization quality and whether the personalization affects thegenerated-texts detectability.

Our results demonstrate the need for strongersafety-filters and disclaimers, as those are not properly functioning in mostof the evaluated LLMs.

Additionally, our study revealed that thepersonalization actually reduces the safety-filter activations; thuseffectively functioning as a jailbreak.

Such behavior must be urgentlyaddressed by LLM developers and service providers.

Published on arXiv on: 2024-12-18T09:48:53Z