Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Is Your LLM Overcharging You? Tokenization, Transparency, and Incentives

Created by
  • Haebom

Author

Ander Artola Velasco, Stratis Tsirtsis, Nastaran Okati, Manuel Gomez-Rodriguez

Outline

This paper analyzes the vulnerabilities of the per-token pricing mechanism used in cloud-based services for large-scale language models (LLMs). Current token-based pricing incentivizes service providers to maximize profits by misreporting the number of tokens used in the model's output, leaving users with no way to verify this. We demonstrate this vulnerability and propose an efficient heuristic algorithm that allows service providers to charge without suspicion. Furthermore, we demonstrate that pricing tokens linearly depends on the number of characters in the token to eliminate this incentive, proposing a method that maintains average profits. We complement our theoretical findings with experimental results using multiple LLMs from the Llama, Gemma, and Ministral families, as well as prompts from the LMSYS Chatbot Arena platform.

Takeaways, Limitations

Takeaways:
The current token-based pricing mechanism exposes LLM service providers to manipulation of their billing practices.
We suggest that pricing tokens linearly with the number of characters in a token is an effective way to remove incentives for manipulation.
We propose a method for providers to adopt incentive-compliant pricing mechanisms while maintaining existing profit margins.
Limitations:
The proposed heuristic algorithm does not completely prevent billing manipulation and is not a perfect solution to strategic behavior by providers.
The experiment was limited to a specific LLM and prompt, and further research is needed to determine its generalizability.
Further consideration is needed as to whether pricing tokens linearly based on the number of characters in a token is practical in all situations.
👍