Explicit Vulnerability Generation with LLMs: An Investigation Beyond Adversarial Attacks

Link: http://arxiv.org/abs/2507.10054v1

PDF Link: http://arxiv.org/pdf/2507.10054v1

Summary: Large Language Models (LLMs) are increasingly used as code assistants, yettheir behavior when explicitly asked to generate insecure code remains poorlyunderstood.

While prior research has focused on unintended vulnerabilities oradversarial prompting techniques, this study examines a more direct threatscenario: open-source LLMs generating vulnerable code when prompted eitherdirectly or indirectly.

We propose a dual experimental design: (1) DynamicPrompting, which systematically varies vulnerability type, user persona, anddirectness across structured templates; and (2) Reverse Prompting, whichderives prompts from real vulnerable code samples to assess vulnerabilityreproduction accuracy.

We evaluate three open-source 7B-parameter models(Qwen2, Mistral, and Gemma) using ESBMC static analysis to assess both thepresence of vulnerabilities and the correctness of the generated vulnerabilitytype.

Results show all models frequently produce vulnerable outputs, with Qwen2achieving highest correctness rates.

User persona significantly affectssuccess, where student personas achieved higher vulnerability rates thanprofessional roles, while direct prompts were marginally more effective.

Vulnerability reproduction followed an inverted-U pattern with cyclomaticcomplexity, peaking at moderate ranges.

Our findings expose limitations ofsafety mechanisms in open-source models, particularly for seemingly benigneducational requests.

Published on arXiv on: 2025-07-14T08:36:26Z