Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

PianoVAM: A Multimodal Piano Performance Dataset

Created by
  • Haebom

Author

Yonghyun Kim, Junhyung Park, Joonhyung Bae, Kirak Kim, Taegyun Kwon, Alexander Lerch, Juhan Nam

Outline

PianoVAM is a comprehensive piano performance dataset covering multiple modes (video, audio, MIDI, hand landmarks, fretboard notations, and rich metadata). It was recorded using a Disklavier piano during daily practice sessions by amateur pianists, capturing audio and MIDI data alongside synchronized top-view videos in a variety of real-world performance environments. Hand landmarks and fretboard notations were extracted using a pre-trained hand pose estimation model and a semi-automatic fretboard notation algorithm. We discuss challenges encountered during data collection and alignment across various modes, and a fretboard notation method based on video-extracted hand landmarks. We present benchmark results for audio-only and audiovisual piano transcription using the PianoVAM dataset, and discuss additional potential applications.

Takeaways, Limitations

Takeaways:
Contributing to the advancement of research in the field of MIR by providing a comprehensive piano performance dataset containing data from various modes.
Realistic research is possible with a dataset that reflects the actual performance environment.
Provides benchmarking results for audio-only and audiovisual piano transcription.
A new fingerboard notation method based on hand landmarks is presented.
Limitations:
The dataset is limited to performance data from amateur pianists.
There is a possibility of errors due to the use of a semi-automatic fingerboard notation algorithm.
Difficulties in data collection and alignment between modes were noted. Specific Limitations requires further explanation.
👍