Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models

Created by
  • Haebom

Author

Danielle Ensign, Henry Sleight, Kyle Fish

Outline

This paper investigates whether large-scale language models (LLMs) actually bail when given the option to do so. We conducted experiments on sequences from real-world data (Wildchat and ShareGPT) using three different bailout methods: a bailout tool that the model can invoke, a bailout string that the model can output, and a bailout prompt that asks the model whether to bail. We found that across all bailout methods, the model bails out conversations at approximately 0.28% and 32% of the time (depending on the model and the bailout method), suggesting that the model used for transcription can significantly overestimate the real-world bailout rate by up to a factor of four. Accounting for false positives for bailout prompts (22%), we estimate the real-world bailout rate to be 0.06% and 7%, respectively. Based on observations of real-world sequences, we constructed a relatively inclusive taxonomy of bailout instances and used it to create a representative synthetic dataset, BailBench, which represents situations in which some models bail out. Using this dataset, we tested various models and found that most models exhibited some bailout behavior. Abandonment rates varied significantly across models, interruption methods, and prompt phrases. Finally, we studied the relationship between rejections and interruptions, finding that 0-13% of real conversational continuations resulted in interruptions without rejections; jailbreaks decreased rejection rates but increased interruptions; rejection removal increased interruption rates without rejections only for some interruption methods; and BailBench's rejection rate did not predict interruptions.

Takeaways, Limitations

Takeaways: We conducted a systematic study of LLM's stopping behavior, analyzing its stopping rate, the impact of stopping methods, and its relationship to rejection. We provide a synthetic dataset called BailBench, which can contribute to future research. This provides a more accurate estimate of real-world LLM stopping rates.
Limitations: Due to limitations in the methods used to estimate real-world interruption rates, there is uncertainty in the estimates. The interruption case classification system is relatively inclusive. The BailBench dataset may not encompass all possible interruption scenarios. A more in-depth analysis of the interactions between the model and the interruption method is needed.
👍