Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Cequel: Cost-Effective Querying of Large Language Models for Text Clustering

Created by
  • Haebom

Author

Hongtao Wang, Taiyan Zhang, Renchi Yang, Jianliang Xu

Outline

This paper proposes a cost-effective framework called Cequel to address the high cost of text clustering using large-scale language models (LLMs). Cequel uses algorithms called EdgeLLM and TriangleLLM to selectively query LLMs for information-rich text pairs or triplets, generating must-link and cannot-link constraints. These constraints are then used in a weighted constraint clustering algorithm to form high-quality clusters. EdgeLLM and TriangleLLM efficiently identify and extract information-rich constraints using a carefully designed greedy selection strategy and prompting technique. Experimental results on various benchmark datasets demonstrate that Cequel outperforms existing unsupervised text clustering methods within the same query budget.

Takeaways, Limitations

Takeaways:
We present a cost-effective framework that achieves accurate text clustering even under limited LLM query budgets.
Improve performance by efficiently extracting information-rich constraints using the EdgeLLM and TriangleLLM algorithms.
It demonstrates superior performance over existing methods on various benchmark datasets.
Limitations:
A detailed discussion on the optimal parameter settings of the proposed algorithm may be lacking.
Further analysis of generalization performance for different types of LLMs may be needed.
Scalability and real-time processing performance may need to be evaluated for practical applications.
👍