posted on 2025-05-01, 00:00authored byEleonora Quaranta
In recent years, advancements in artificial intelligence have driven a growing demand for
personalized user experiences across various digital platforms. In the music domain, this
trend is reflected in the need for more sophisticated recommendation systems beyond
traditional collaborative filtering methods.
This thesis introduces an emotion-based multimodal music classifier, leveraging both
audio features and song lyrics to capture the emotional content of music. By focusing on
song content and emotional attributes, this approach aims to lay the groundwork for
recommendation systems capable of providing users with a more customized and
emotionally resonant experience, also addressing the cold-start problem typical of
collaborative filtering-based recommendation.
Following an overview of existing literature on the topic and the examination of the
challenges posed by the specific field of interest, the first contribution of this work is the
creation of a suitable dataset for Music Emotion Recognition: this is achieved by
extending a subset of the Music4All-Onion dataset with emotion-based labels for song
lyrics using an eight-class emotional model. Audio data, available in the form of pre-
extracted acoustic features, is analyzed using unsupervised machine learning methods to
meaningfully model the underlying structures and patterns associated with emotional
music content.
Different baselines methods are identified as benchmarks to comparatively evaluate the
proposed approaches. The final emotion-based multimodal classifier primarily relies on
textual data in the form of song lyrics, incorporating acoustic information as an auxiliary
feature to enhance the emotional classification. The model achieves promising results in
the context of Music Emotion Recognition, considering the data availability issues
characterizing research in this field.