SelfPrompt: Autonomously Evaluating LLM Robustness via Domain-Constrained Knowledge Guidelines and Refined Adversarial Prompts

Link: http://arxiv.org/abs/2412.00765v1

PDF Link: http://arxiv.org/pdf/2412.00765v1

Summary: Traditional methods for evaluating the robustness of large language models(LLMs) often rely on standardized benchmarks, which can escalate costs andlimit evaluations across varied domains.

This paper introduces a novelframework designed to autonomously evaluate the robustness of LLMs byincorporating refined adversarial prompts and domain-constrained knowledgeguidelines in the form of knowledge graphs.

Our method systematically generatesdescriptive sentences from domain-constrained knowledge graph triplets toformulate adversarial prompts, enhancing the relevance and challenge of theevaluation.

These prompts, generated by the LLM itself and tailored to evaluateits own robustness, undergo a rigorous filtering and refinement process,ensuring that only those with high textual fluency and semantic fidelity areused.

This self-evaluation mechanism allows the LLM to evaluate its robustnesswithout the need for external benchmarks.

We assess the effectiveness of ourframework through extensive testing on both proprietary models like ChatGPT andopen-source models such as Llama-3.

1, Phi-3, and Mistral.

Results confirm thatour approach not only reduces dependency on conventional data but also providesa targeted and efficient means of evaluating LLM robustness in constraineddomains.

Published on arXiv on: 2024-12-01T10:58:53Z