Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

From Intent Discovery to Recognition with Topic Modeling and Synthetic Data

Created by
  • Haebom

Author

Aaron Rodrigues, Mahmood Hegazy, Azzam Naeem

Outline

In this paper, we propose an agent-type LLM framework for AI systems that understand and recognize customer intent in domains characterized by short sentences and cold-start problems. To overcome the limitations of existing methods, we extend 36 general user intents to 278 fine-grained intents through hierarchical topic modeling and intent discovery, and generate synthetic user query data to augment real utterances and reduce the dependency on human annotations, especially in resource-poor environments. Through LLM-based topic modeling and strategic use of synthetic utterances, we improve the variability and coverage of the dataset, thereby presenting a comprehensive and powerful framework for discovering and recognizing novel customer intents online. In particular, we improve the quality and usability of synthetic queries through few-shot prompting, and show that intent descriptions and keywords generated by LLM can effectively replace those generated by humans.

Takeaways, Limitations

Takeaways:
We demonstrate that hierarchical topic modeling using LLM can significantly improve the segmentation and diversity of customer intent.
Generating synthetic query data presents the potential to address cold start issues and improve dataset variability and coverage.
LLM-generated intent descriptions and keywords perform on par with human-generated ones, suggesting potential for saving manpower and time.
Presenting a method to effectively discover and recognize new customer intent online through an agent-based LLM framework.
Limitations:
Lack of verification of the proposed framework's application to actual service environments.
Because of the high dependence on the performance of LLM, there is a possibility that the limitations of LLM may affect the performance of the framework.
Lack of objective evaluation criteria and indicators for the quality of synthetic query data.
Further research is needed on generalizability across different domains and languages.
👍