Towards an Improved Model for Visual Storytelling

Modi, Yatri Manoj

doi:10.25417/uic.13475751.v1

Towards an Improved Model for Visual Storytelling

thesis

posted on 2020-05-01, 00:00 authored by Yatri Manoj Modi

Visual storytelling is an intriguing and complex task that only recently entered the language and vision research arena. The task focuses on generating human-like, coherent and visually grounded stories from a sequence of images while maintaining the context over these images. In this study I survey recent advances in the field and conduct a thorough error analysis of three approaches to visual storytelling. I categorize and provide examples of common types of errors, and identify key shortcomings in prior work. Later, I make recommendations for addressing these limitations, and propose an improved model for visual storytelling: a hierarchical encoder-decoder network, with co-attention over the images and their natural language literal descriptions. I assess the performance of this model at generating visual stories. Finally, I experiment with a novel metric, BertScore (Zhang et al.,2019), as an alternative to human evaluation.

History

Advisor

Parde, Natalie

Chair

Parde, Natalie

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Masters

Degree name

MS, Master of Science

Committee Member

Di Eugenio, Barbara Ravi, Sathya

Submitted date

May 2020

Thesis type

application/pdf

Language

en

Usage metrics

Keywords

Natural Language Processing Computer Vision Deep Learning Computer Science Artificial Intelligence

Licence

In Copyright

Towards an Improved Model for Visual Storytelling

History

Advisor

Chair

Department

Degree Grantor

Degree Level

Degree name

Committee Member

Submitted date

Thesis type

Language

Usage metrics

Categories

Keywords

Licence

Exports