Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!

Created by
  • Haebom

Author

Jingdi Lei, Varun Gumma, Rishabh Bhardwaj, Seok Min Lim, Chuan Li, Amir Zadeh, Souzanya Poria

Outline

The safety of large-scale language models (LLMs) is one of the most pressing challenges for widespread deployment. Unlike previous research that focuses on general harmfulness, enterprises have fundamental concerns about whether LLM-based agents are safe for their intended use cases. To address this issue, we define "operational safety" as the ability of an LLM to appropriately accept or reject user queries for a specific purpose, and propose "OffTopicEval," an evaluation suite and benchmark for measuring operational safety in general and specific agent use cases. Evaluation results on six model families consisting of 20 open-weighted LLMs reveal that none of the models maintain a high level of operational safety. To address this issue, we propose query-based (Q-ground) and system-prompt-based (P-ground) prompt-based steering methods, significantly improving OOD rejection.

Takeaways, Limitations

Takeaways:
Operational safety of LLM is a key challenge for widespread deployment, and current models do not achieve a sufficient level of safety.
The OffTopicEval benchmark is a useful tool for evaluating operational safety.
Prompt-based steering methods (Q-ground, P-ground) are effective ways to improve OOD rejection, which can contribute to improving the safety of LLM-based agents.
Limitations:
The operational safety scores of the proposed models are generally low, requiring further improvement.
Further research is needed to ensure that prompt-based steering methods are equally effective across all models and in all situations.
Beyond what is presented in the paper, other methodologies for enhancing the safety of LLM are needed.
👍