University of Illinois at Chicago
MODI-THESIS-2020.pdf (4.21 MB)

Towards an Improved Model for Visual Storytelling

Download (4.21 MB)
posted on 2020-05-01, 00:00 authored by Yatri Manoj Modi
Visual storytelling is an intriguing and complex task that only recently entered the language and vision research arena. The task focuses on generating human-like, coherent and visually grounded stories from a sequence of images while maintaining the context over these images. In this study I survey recent advances in the field and conduct a thorough error analysis of three approaches to visual storytelling. I categorize and provide examples of common types of errors, and identify key shortcomings in prior work. Later, I make recommendations for addressing these limitations, and propose an improved model for visual storytelling: a hierarchical encoder-decoder network, with co-attention over the images and their natural language literal descriptions. I assess the performance of this model at generating visual stories. Finally, I experiment with a novel metric, BertScore (Zhang et al.,2019), as an alternative to human evaluation.



Parde, Natalie


Parde, Natalie


Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Masters

Degree name

MS, Master of Science

Committee Member

Di Eugenio, Barbara Ravi, Sathya

Submitted date

May 2020

Thesis type



  • en

Usage metrics


    No categories selected