Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

SAGE-Music: Low-Latency Symbolic Music Generation via Attribute-Specialized Key-Value Head Sharing

Created by
  • Haebom

Author

Jiaye Tan, Haonan Luo, Linfeng Song, Shuaiqi Chen, Yishan Lyu, Zian Zhong, Roujia Wang, Daniel Jiang, Haoran Zhang, Jiaming Bai, Haoran Cheng, Q. Vera Liao, Hao-Wen Dong

Outline

AS-KVHS is a methodology for low-latency symbolic music generation, essential for real-time improvisation and human-AI co-creation. Existing Transformer-based models suffer from a tradeoff between inference speed and musical quality. While Byte Pair Encoding (BPE) is effective for single-track piano data, it suffers from significant performance degradation in multi-track settings. In this paper, we propose AS-KVHS (Attribute-Specialized Key-Value Head Sharing), designed for structured symbolic representation of music. We achieve approximately 30% inference speed improvement, with approximately 0.4% quality degradation in objective evaluations and slight improvements in subjective listening tests. Furthermore, we release the open-source benchmark SAGE-Music, which matches or surpasses state-of-the-art models in terms of generation quality.

Takeaways, Limitations

Takeaways:
We conducted the first systematic study of the generalizability of BPE to multitrack symbolic music.
We introduce AS-KVHS for low-latency symbolic music generation, improving the balance between inference speed and quality.
We have made the open-source benchmark SAGE-Music available to facilitate research reproducibility and advancement.
Limitations:
The effectiveness of the AS-KVHS has been measured primarily through objective assessments and subjective listening tests, and its generalizability to different music genres and complex musical structures requires further research.
Further research is needed to determine whether AS-KVHS can be applied to other music generation models and architectures.
Further analysis of the limitations and potential for improvement of the SAGE-Music benchmark is needed.
👍