PyTorch implementation of Towards Proper Contrastive Self-supervised Learning Strategies for Music Audio Representation by Jeong Choi et al.
In this work, we focuse on assessing the potential of self-supervised music embeddings as a general representation. We set up experiments to compare the performance in various MIR tasks between different self-supervision strategies. We investigate to what extent we can benefit from music audio representations learned from some of widely used contrastive learning schemes by analyzing the results on three different MIR tasks (instrument classification, genre classification, and music recommendation) which are considered to represent different aspects of music similarity. Our experiments are set up using contrastive learning algorithms with variations in input / target instance settings and model architectures, which are designed to capture different levels the music semantic - global or regional information. Our strategies are categorized in the following table.
We then use the trained models as feature extractors and evaluate on different MIR tasks, where each task represents a certain abstraction level of music audio information. We compare the self-supervised embeddings with MFCCs which has long been a solid baseline feature in audio classification tasks.
Run the following command to pre-train the model on the FMA_small dataset.
python main.py --USE_YAML_CONFIG 0
To test a trained model, make sure to set the LOAD_WEIGHT_FROM, or specify it as an argument:
python main.py --MODE inference --LOAD_WEIGHT_FROM SomeCheckpointPath
To be updated.


