Enhancing Visual Question Answering with Linguistic Information

Alizadeh, Mehrdad

doi:10.25417/uic.13475715.v1

ALIZADEH-DISSERTATION-2020.pdf (5.62 MB)

Enhancing Visual Question Answering with Linguistic Information

thesis

posted on 2020-08-01, 00:00 authored by Mehrdad Alizadeh

Visual Question Answering (VQA) concerns providing answers to natural language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, given a complex free form question the language understanding component becomes crucial. In this work, I hypothesize that if the question focuses on events described by verbs, then the model should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. My first contribution is a new VQA dataset (imSituVQA) that I built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, I propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. The experiments on imSituVQA show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance. Semantic role labeling is an alternative solution to approximately annotate any VQA dataset of interest. I employed a PropBank based semantic role labeler to label a subset of the VQA dataset (VQAsub). Then I trained the proposed multi-task CNN-LSTM model with VQAsub. The results show a slight improvement over the single-task CNN-LSTM model.

History

Advisor

Di Eugenio , Barbara

Chair

Di Eugenio , Barbara

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Parde, Natalie Caragea, Cornelia Ziebart, Brian Enis Cetin, Ahmet

Submitted date

August 2020

Thesis type

application/pdf

Language

en

Usage metrics

Keywords

Visual Question Answering Verb Semantics Data Augmentation Deep Learning, Multitask Learning.

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Enhancing Visual Question Answering with Linguistic Information

History

Advisor

Chair

Department

Degree Grantor

Degree Level

Degree name

Committee Member

Submitted date

Thesis type

Language

Usage metrics

Categories

Keywords

Licence

Exports