This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
AS-KVHS is a methodology for low-latency symbolic music generation, essential for real-time improvisation and human-AI co-creation. Existing Transformer-based models suffer from a tradeoff between inference speed and musical quality. While Byte Pair Encoding (BPE) is effective for single-track piano data, it suffers from significant performance degradation in multi-track settings. In this paper, we propose AS-KVHS (Attribute-Specialized Key-Value Head Sharing), designed for structured symbolic representation of music. We achieve approximately 30% inference speed improvement, with approximately 0.4% quality degradation in objective evaluations and slight improvements in subjective listening tests. Furthermore, we release the open-source benchmark SAGE-Music, which matches or surpasses state-of-the-art models in terms of generation quality.
Takeaways, Limitations
•
Takeaways:
◦
We conducted the first systematic study of the generalizability of BPE to multitrack symbolic music.
◦
We introduce AS-KVHS for low-latency symbolic music generation, improving the balance between inference speed and quality.
◦
We have made the open-source benchmark SAGE-Music available to facilitate research reproducibility and advancement.
•
Limitations:
◦
The effectiveness of the AS-KVHS has been measured primarily through objective assessments and subjective listening tests, and its generalizability to different music genres and complex musical structures requires further research.
◦
Further research is needed to determine whether AS-KVHS can be applied to other music generation models and architectures.
◦
Further analysis of the limitations and potential for improvement of the SAGE-Music benchmark is needed.