Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

Embedding Alignment in Code Generation for Audio

Created by
  • Haebom

Author

Sam Kouteili, Hiren Madhu, George Typaldos, Mark Santolucito

Outline

LLM-based code generation has the potential to revolutionize creative coding tasks, such as live coding, by allowing users to focus on structural motifs rather than syntactic details. When prompted by LLM, users can consider a variety of chord candidates to better realize their musical intent. However, code generation models struggle to present unique and diverse chord candidates without direct insight into the audio output of the chords. To better establish the relationship between chord candidates and the generated audio, we investigate the mapping topology between the chord and audio embedding spaces. While we find that chords and audio embeddings do not exhibit a simple linear relationship, we complement this with a constructed predictive model that demonstrates the ability to learn an embedding alignment map. Given a chord, we propose a model that predicts the output audio embedding and constructs a chord-audio embedding alignment map, targeting musically diverse outputs.

Takeaways, Limitations

Takeaways: By analyzing the mapping topology between code and audio embeddings, we suggest the possibility of improving the performance of LLM-based code generation models. A predictive model that learns a code-to-audio embedding alignment map can enable musically diverse code generation. This opens up innovative possibilities in creative coding fields such as live coding.
Limitations: While we have shown that the relationship between code and audio embeddings is not a simple linear one, we lack details on the specific methods and performance of learning the embedding alignment map. Further research is needed to evaluate the generalization performance of the proposed model and its applicability to various music genres. Validation of its effectiveness in a real-world live coding environment is also necessary.
👍