This page organizes papers related to artificial intelligence published around the world. This page is summarized using Google Gemini and is operated on a non-profit basis. The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.
Stylus: Repurposing Stable Diffusion for Training-Free Music Style Transfer on Mel-Spectrograms
Created by
Haebom
Author
Heehwan Wang, Joonwoo Kwon, Sooyoung Kim, Jungwoo Seo, Shinjae Yoo, Yuewei Lin, Jiook Cha
Outline
This paper presents Stylus, a training-free framework for musical style transfer in the Mel Spectrogram domain, leveraging a pre-trained Stable Diffusion model. Stylus manipulates the self-attention mechanism by injecting stylistic key-value features while preserving musical structure by preserving source queries. To avoid artifacts caused by Griffin-Limm reconstruction, we introduce a phase-preserving reconstruction strategy and adopt a control scheme inspired by classifier-free guidance for adaptive stylization and multi-style mixing. Experimental results show that Stylus improves content preservation by 34.1% and perceptual quality by 25.7% compared to existing state-of-the-art techniques without additional training.
Takeaways, Limitations
•
Takeaways:
◦
Transfer musical styles without training data by leveraging pre-trained models.
◦
Achieve improved content retention and perception quality compared to existing methods.
◦
Performance enhancement through phase-preserving reconstruction strategies and classifier-free guidance-based control.
◦
The potential to provide efficient music personalization and creation tools.
•
Limitations:
◦
Limitations due to dependencies in the Stable Diffusion model.
◦
Based on the Mel Spectrogram, there is a possibility of sound quality degradation.
◦
There is a need to evaluate generalization performance across various music genres.
◦
Additional subjective evaluations from actual music composers are needed.