Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation

Created by
  • Haebom

Author

Keunwoo Choi, Seungheon Doh, Juhan Nam

Outline

TalkPlayData 2 is a synthetic dataset for multimodal conversational music recommendation generated through an agent-based data pipeline. This pipeline generates multiple large-scale language model (LLM) agents with various roles and specialized prompts, and records conversations between the Listener LLM and the Recsys LLM to obtain chat data. To address diverse conversational scenarios, the Listener LLM in each conversation is conditioned on fine-tuned conversational objectives. All LLMs are multimodal, including audio and images, enabling multimodal recommendation and conversation simulation. In LLM-as-a-judge and subjective evaluation experiments, TalkPlayData 2 achieved its goals across various aspects relevant to training a music generation recommendation model.

Takeaways, Limitations

Takeaways:
Generating a multimodal conversational music recommendation dataset using an agent-based pipeline.
Data structure covering various conversation scenarios
Recommendation and Conversation Simulation Using Multimodal LLM
Can be used to train music generation recommendation models
TalkPlayData 2 and its generation code released
Limitations:
No specific Limitations mentioned in the abstract
👍