This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music

Abstract.

Definitive embeddings remain a fundamental challenge of computational musicology for symbolic music in deep learning today. Analogous to natural languages, a piece of music can be modeled as a sequence of tokens. This motivates most existing solutions to explore the utilization of word embedding models in natural language processing (e.g., skip-gram and CBOW) to build music embeddings. However, music differs in two key aspects from natural languages: (1) musical token is multi-faceted – it comprises of pitch, rhythm and dynamic information simultaneously; and (2) musical context is two-dimensional – each musical token is dependent on the surrounding tokens from both melodic and harmonic contexts.

In this work, we attempt to provide a comprehensive solution by proposing a novel framework named PiRhDy that integrates pitch, rhythm and dynamics information seamlessly. Specifically, PiRhDy adopts a hierarchical strategy which can be decomposed into two steps: (1) token (note event) modeling, which separately represents pitch, rhythm and dynamics and integrates them into a single token embedding; and (2) context modeling, which utilizes melodic and harmonic knowledge to train the token embedding. To examine our method, we make a thorough study by decomposing PiRhDy on components and strategies. We further valid our embeddings in three downstream tasks of melody completion, accompaniment suggestion and genre classification. We demonstrate our PiRhDy embeddings significantly outperform the baseline methods.

Symbolic Music, Representation Learning, Embeddings

1. Conclusion

In this paper, we proposed a comprehensive framework (PiRhDy) that can embed music from scratch. The framework is built on a hierarchical strategy that handling music from token-level to sequence-level. In detail, we designed a token modeling network to fuse various features into a dense representation, and feed these token representation into a context modeling network (sequence-level) to smooth embeddings. The experimental results suggest the robustness and effectiveness of our PiRhDy embeddings.

We believe the insights of PiRhDy are inspirational to future developments of computational musicology. Besides, our PiRhDy embeddings can be the cornerstone of numerous applications related to deep symbolic music learning. In the future, we plan to further explore embeddings smoothed on the rhythm and tonal patterns, which is analogous to the syntax of natural languages. The results of such study can help to understand and interpret the structured characteristics of music. Another direction is to inject the audio content into the learning procedure, so as to enable embeddings smoothed on multimodal content. Lastly, we will future investigate the potential of our embeddings in tasks such as music similarity and music generation.