University of Illinois Chicago
Browse

Speaker Orientation Estimation from Spatial and Spectral Features Using a Deep Neural Network

Download (2.55 MB)
thesis
posted on 2025-05-01, 00:00 authored by Shiyu Liu
Accurate estimation of speaker orientation significantly enhances applications such as hearing aids, teleconferencing systems, and voice-controlled interfaces. Speaker orientation information can help a hearing aid to decide whether to enhance that talker. It also enhances camera tracking and audio quality in teleconferencing systems. And help to decide whether a voice interface respond. This thesis introduces a deep neural network method using combined spatial and spectral audio features for speaker orientation estimation. Spatial features are derived using a weighted Generalized Cross-Correlation with Phase Transform (GCC-PHAT) technique applied to three microphones placed around the speaker. Spectral features capture the directivity patterns of human speech from Mel spectrogram. The proposed method achieves significantly reduced estimation errors compared to approaches using single feature type. Experimental results show better accuracy, validating the effectiveness of the combined feature to be suitable for real world implementations.

History

Advisor

Ryan M Corey

Department

Electrical and Computer Engineering

Degree Grantor

University of Illinois Chicago

Degree Level

  • Masters

Degree name

MS, Master of Science

Committee Member

Ahmet Enis Cetin Aritra Banerjee

Thesis type

application/pdf

Language

  • en

Usage metrics

    Dissertations and Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC