Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models

Link: http://arxiv.org/abs/2502.19883v1

PDF Link: http://arxiv.org/pdf/2502.19883v1

Summary: Small language models (SLMs) have become increasingly prominent in thedeployment on edge devices due to their high efficiency and low computationalcost.

While researchers continue to advance the capabilities of SLMs throughinnovative training strategies and model compression techniques, the securityrisks of SLMs have received considerably less attention compared to largelanguage models (LLMs).

To fill this gap, we provide a comprehensive empiricalstudy to evaluate the security performance of 13 state-of-the-art SLMs undervarious jailbreak attacks.

Our experiments demonstrate that most SLMs are quitesusceptible to existing jailbreak attacks, while some of them are evenvulnerable to direct harmful prompts.

To address the safety concerns, weevaluate several representative defense methods and demonstrate theireffectiveness in enhancing the security of SLMs.

We further analyze thepotential security degradation caused by different SLM techniques includingarchitecture compression, quantization, knowledge distillation, and so on.

Weexpect that our research can highlight the security challenges of SLMs andprovide valuable insights to future work in developing more robust and secureSLMs.

Published on arXiv on: 2025-02-27T08:44:04Z