[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Created by
  • Haebom

Author

Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Mart inez-Ram irez, Liwei Lin, Gus Xia, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

Outline

This paper presents Instruct-MusicGen, a novel method to fine-tune the existing MusicGen model to improve the efficiency of text-based music editing. Existing text-to-music editing methods have the resource-intensive problem of having to learn a model for a specific task from scratch, and the problem of inaccurate audio reconstruction that occurs in the process of predicting music using a large language model. Instruct-MusicGen adds a text fusion module and an audio fusion module to process text commands and audio inputs simultaneously and generate the desired edited music. Despite adding only 8% of new parameters to the existing MusicGen model and training only 5,000 steps, it outperforms existing methods and achieves performance similar to models specialized for a specific task. This contributes to improving the efficiency of text-to-music editing and expanding the applicability of music language models in dynamic music production environments.

Takeaways, Limitations

Takeaways:
Solving resource-intensive problems of existing text-to-music editing models: Achieving high performance with significantly fewer resources (parameters and training steps) than existing models.
Improve the efficiency of text-based music editing: perform various editing operations (add, remove, split, etc.) efficiently.
Extending the scope of applications of music language models: Increasing their usability in dynamic music production environments.
Achieve performance similar to models specialized for specific tasks.
Limitations:
In this paper, specific Limitations is not explicitly mentioned. Additional experiments and analyses may reveal situations where performance degradation may occur, or limitations on the type and complexity of editable music.
Since it is based on the MusicGen model, the __T31_____ of the MusicGen model itself may also affect Instruct-MusicGen.
5000 steps is a relatively small number of training steps, but it may not be enough in certain situations.
👍