Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

Exploring Adapter Design Tradeoffs for Low Resource Music Generation

Created by
  • Haebom

Author

Atharva Mehta, Shivam Chauhan, Monojit Choudhury

Outline

This paper studies parameter-efficient fine-tuning (PEFT) techniques, particularly adapter-based methods, for large-scale music generation models such as MusicGen and Mustango. We explore optimal adapter designs by comparing various adapter configurations (architecture, layout, and size) for two resource-sparse music genres: Hindustani classical music and Turkish Makam music. We find that convolution-based adapters excel at fine-grained musical details, while transformer-based adapters better preserve long-term dependencies. Furthermore, we find that a medium-sized adapter (40M parameters) offers the optimal balance between expressiveness and quality. Mustango (a diffusion-based model) offers excellent diversity but suffers from instability, while MusicGen (an autoregressive model) trains quickly and produces high-quality artifacts but produces somewhat redundant artifacts.

Takeaways, Limitations

Takeaways:
Convolution-based adapters are effective for detailed musical expressions (ornaments, short melodies), while transformer-based adapters are effective for maintaining long-term dependencies.
A medium-sized adapter with approximately 40M parameters performs optimally in terms of expressiveness and quality.
Comparative analysis of the strengths and weaknesses of the MusicGen and Mustango models to provide guidelines for model selection.
Limitations:
The genres studied were limited to Hindustani classical music and Turkish Makam music.
Generalizability to other low-resource music genres requires further study.
👍