Link: http://arxiv.org/abs/2412.00765v1
PDF Link: http://arxiv.org/pdf/2412.00765v1
Summary: Traditional methods for evaluating the robustness of large language models(LLMs) often rely on standardized benchmarks, which can escalate costs andlimit evaluations across varied domains.
This paper introduces a novelframework designed to autonomously evaluate the robustness of LLMs byincorporating refined adversarial prompts and domain-constrained knowledgeguidelines in the form of knowledge graphs.
Our method systematically generatesdescriptive sentences from domain-constrained knowledge graph triplets toformulate adversarial prompts, enhancing the relevance and challenge of theevaluation.
These prompts, generated by the LLM itself and tailored to evaluateits own robustness, undergo a rigorous filtering and refinement process,ensuring that only those with high textual fluency and semantic fidelity areused.
This self-evaluation mechanism allows the LLM to evaluate its robustnesswithout the need for external benchmarks.
We assess the effectiveness of ourframework through extensive testing on both proprietary models like ChatGPT andopen-source models such as Llama-3.
1, Phi-3, and Mistral.
Results confirm thatour approach not only reduces dependency on conventional data but also providesa targeted and efficient means of evaluating LLM robustness in constraineddomains.
Published on arXiv on: 2024-12-01T10:58:53Z