Link: http://arxiv.org/abs/2506.13510v1
PDF Link: http://arxiv.org/pdf/2506.13510v1
Summary: As Large Language Models (LLMs) increasingly power applications used bychildren and adolescents, ensuring safe and age-appropriate interactions hasbecome an urgent ethical imperative.
Despite progress in AI safety, currentevaluations predominantly focus on adults, neglecting the uniquevulnerabilities of minors engaging with generative AI.
We introduceSafe-Child-LLM, a comprehensive benchmark and dataset for systematicallyassessing LLM safety across two developmental stages: children (7-12) andadolescents (13-17).
Our framework includes a novel multi-part dataset of 200adversarial prompts, curated from red-teaming corpora (e.
g.
, SG-Bench,HarmBench), with human-annotated labels for jailbreak success and astandardized 0-5 ethical refusal scale.
Evaluating leading LLMs -- includingChatGPT, Claude, Gemini, LLaMA, DeepSeek, Grok, Vicuna, and Mistral -- weuncover critical safety deficiencies in child-facing scenarios.
This workhighlights the need for community-driven benchmarks to protect young users inLLM interactions.
To promote transparency and collaborative advancement inethical AI development, we are publicly releasing both our benchmark datasetsand evaluation codebase athttps://github.
com/The-Responsible-AI-Initiative/Safe_Child_LLM_Benchmark.
git
Published on arXiv on: 2025-06-16T14:04:54Z