This is a page that curates AI-related papers published worldwide. All content here is summarized using Google Gemini and operated on a non-profit basis. Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.
RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing
Created by
Haebom
Author
Liting Gao, Yi Yuan, Yaru Chen, Yuelan Cheng, Zhenbo Li, Juan Wen, Shubin Zhang, Wenwu Wang
Outline
This paper proposes a novel, efficient end-to-end rectified flow matching-based diffusion framework for text-based audio editing. While existing training-based and zero-shot approaches struggle with complex editing or lack practicality, the proposed method achieves faithful semantic alignment without auxiliary captions or masks, while maintaining competitive editing quality. Furthermore, we constructed a dataset featuring overlapping multi-event audio to support training and benchmarking in complex scenarios.
Takeaways, Limitations
•
Takeaways:
◦
We provide an efficient and accurate end-to-end solution in the field of text-based audio editing.
◦
Achieve faithful semantic alignment without the need for auxiliary captions or masks.
◦
We contribute to future research by providing a new dataset containing multi-event audio.
◦
We demonstrate our competitive editing quality through various indicators.
•
Limitations:
◦
Further research is needed to determine how well the proposed method generalizes.
◦
Experiments with more diverse and complex audio data are needed.
◦
There is a lack of analysis of specific computational costs and memory usage.