LLMs as Deceptive Agents: How Role-Based Prompting Induces Semantic Ambiguity in Puzzle Tasks

Link: http://arxiv.org/abs/2504.02254v1

PDF Link: http://arxiv.org/pdf/2504.02254v1

Summary: Recent advancements in Large Language Models (LLMs) have not only showcasedimpressive creative capabilities but also revealed emerging agentic behaviorsthat exploit linguistic ambiguity in adversarial settings.

In this study, weinvestigate how an LLM, acting as an autonomous agent, leverages semanticambiguity to generate deceptive puzzles that mislead and challenge human users.

Inspired by the popular puzzle game "Connections", we systematically comparepuzzles produced through zero-shot prompting, role-injected adversarialprompts, and human-crafted examples, with an emphasis on understanding theunderlying agent decision-making processes.

Employing computational analyseswith HateBERT to quantify semantic ambiguity, alongside subjective humanevaluations, we demonstrate that explicit adversarial agent behaviorssignificantly heighten semantic ambiguity -- thereby increasing cognitive loadand reducing fairness in puzzle solving.

These findings provide criticalinsights into the emergent agentic qualities of LLMs and underscore importantethical considerations for evaluating and safely deploying autonomous languagesystems in both educational technologies and entertainment.

Published on arXiv on: 2025-04-03T03:45:58Z