Link: http://arxiv.org/abs/2511.06890v1
PDF Link: http://arxiv.org/pdf/2511.06890v1
Summary: Large Language Models for Simulating Professions (SP-LLMs), particularly asteachers, are pivotal for personalized education.
However, ensuring theirprofessional competence and ethical safety is a critical challenge, as existingbenchmarks fail to measure role-playing fidelity or address the unique teachingharms inherent in educational scenarios.
To address this, we proposeEduGuardBench, a dual-component benchmark.
It assesses professional fidelityusing a Role-playing Fidelity Score (RFS) while diagnosing harms specific tothe teaching profession.
It also probes safety vulnerabilities usingpersona-based adversarial prompts targeting both general harms and,particularly, academic misconduct, evaluated with metrics including AttackSuccess Rate (ASR) and a three-tier Refusal Quality assessment.
Our extensiveexperiments on 14 leading models reveal a stark polarization in performance.
While reasoning-oriented models generally show superior fidelity, incompetenceremains the dominant failure mode across most models.
The adversarial testsuncovered a counterintuitive scaling paradox, where mid-sized models can be themost vulnerable, challenging monotonic safety assumptions.
Critically, weidentified a powerful Educational Transformation Effect: the safest modelsexcel at converting harmful requests into teachable moments by providing idealEducational Refusals.
This capacity is strongly negatively correlated with ASR,revealing a new dimension of advanced AI safety.
EduGuardBench thus provides areproducible framework that moves beyond siloed knowledge tests toward aholistic assessment of professional, ethical, and pedagogical alignment,uncovering complex dynamics essential for deploying trustworthy AI ineducation.
See https://github.
com/YL1N/EduGuardBench for Materials.
Published on arXiv on: 2025-11-10T09:42:24Z