Link: http://arxiv.org/abs/2506.23930v1
PDF Link: http://arxiv.org/pdf/2506.23930v1
Summary: The rapid expansion of social media leads to a marked increase in hatespeech, which threatens personal lives and results in numerous hate crimes.
Detecting hate speech presents several challenges: diverse dialects, frequentcode-mixing, and the prevalence of misspelled words in user-generated contenton social media platforms.
Recent progress in hate speech detection istypically concentrated on high-resource languages.
However, low-resourcelanguages still face significant challenges due to the lack of large-scale,high-quality datasets.
This paper investigates how we can overcome thislimitation via prompt engineering on large language models (LLMs) focusing onlow-resource Bengali language.
We investigate six prompting strategies -zero-shot prompting, refusal suppression, flattering the classifier, multi-shotprompting, role prompting, and finally our innovative metaphor prompting todetect hate speech effectively in low-resource languages.
We pioneer themetaphor prompting to circumvent the built-in safety mechanisms of LLMs thatmarks a significant departure from existing jailbreaking methods.
Weinvestigate all six different prompting strategies on the Llama2-7B model andcompare the results extensively with three pre-trained word embeddings - GloVe,Word2Vec, and FastText for three different deep learning models - multilayerperceptron (MLP), convolutional neural network (CNN), and bidirectional gatedrecurrent unit (BiGRU).
To prove the effectiveness of our metaphor prompting inthe low-resource Bengali language, we also evaluate it in another low-resourcelanguage - Hindi, and two high-resource languages - English and German.
Theperformance of all prompting techniques is evaluated using the F1 score, andenvironmental impact factor (IF), which measures CO$_2$ emissions, electricityusage, and computational time.
Published on arXiv on: 2025-06-30T14:59:25Z