Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Stylus: Repurposing Stable Diffusion for Training-Free Music Style Transfer on Mel-Spectrograms

Created by
  • Haebom

Author

Heehwan Wang, Joonwoo Kwon, Sooyoung Kim, Jungwoo Seo, Shinjae Yoo, Yuewei Lin, Jiook Cha

Outline

This paper presents Stylus, a training-free framework for musical style transfer in the Mel Spectrogram domain, leveraging a pre-trained Stable Diffusion model. Stylus manipulates the self-attention mechanism by injecting stylistic key-value features while preserving musical structure by preserving source queries. To avoid artifacts caused by Griffin-Limm reconstruction, we introduce a phase-preserving reconstruction strategy and adopt a control scheme inspired by classifier-free guidance for adaptive stylization and multi-style mixing. Experimental results show that Stylus improves content preservation by 34.1% and perceptual quality by 25.7% compared to existing state-of-the-art techniques without additional training.

Takeaways, Limitations

Takeaways:
Transfer musical styles without training data by leveraging pre-trained models.
Achieve improved content retention and perception quality compared to existing methods.
Performance enhancement through phase-preserving reconstruction strategies and classifier-free guidance-based control.
The potential to provide efficient music personalization and creation tools.
Limitations:
Limitations due to dependencies in the Stable Diffusion model.
Based on the Mel Spectrogram, there is a possibility of sound quality degradation.
There is a need to evaluate generalization performance across various music genres.
Additional subjective evaluations from actual music composers are needed.
👍