University of Illinois Chicago
Browse

Ainur: Enhancing Vocal Quality through Lyrics-Audio Embeddings in Multimodal Deep Music Generation

Download (1.55 MB)
thesis
posted on 2023-08-01, 00:00 authored by Giuseppe Concialdi
As an emerging research field, deep music generation faces significant challenges, such as handling high-dimensionality of audio data, computational resource requirements, and quality concerns, particularly with generated vocals. This study aims to address these concerns by introducing Ainur, an innovative deep learning model designed specifically to enhance the quality of generated vocals. We investigate the effectiveness of various deep learning techniques and multimodal input conditioning strategies to improve vocal generation. Additionally, the utility of transfer learning and pre-trained models is examined, along with the impact of multimodal input strategies on the quality and diversity of the produced music. Ainur employs a hierarchical diffusion model and a latent diffusion prior for handling high-dimensional data and uses Contrastive Lyrics-Audio Spectrogram Pre-training (CLASP) embeddings for multimodal data fusion. Our findings reveal Ainur's capability to produce high-quality and varied music, substantiating the use of our proposed novel evaluation metrics. The study also acknowledges the importance of ethical considerations and limitations inherent to deep music generation. Recognizing the potential implications of AI-generated music on creative integrity, and the potential misuse of such technology, we emphasize the need for responsible use. This work significantly contributes to the deep music generation field, establishing novel methodologies, offering robust tools, and providing directions for future research, while promoting collaboration and transparency through the open-source nature of Ainur.

History

Advisor

Di Eugenio, Barbara

Chair

Di Eugenio, Barbara

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Masters

Degree name

MS, Master of Science

Committee Member

Baralis, Elena Parde, Natalie

Submitted date

August 2023

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC