MODI-THESIS-2020.pdf (4.21 MB)
Download fileTowards an Improved Model for Visual Storytelling
thesis
posted on 2020-05-01, 00:00 authored by Yatri Manoj ModiVisual storytelling is an intriguing and complex task that only recently entered the language and vision research arena. The task focuses on generating human-like, coherent and visually grounded stories from a sequence of images while maintaining the context over these images. In this study I survey recent advances in the field and conduct a thorough error analysis of three approaches to visual storytelling. I categorize and provide examples of common types of errors, and identify key shortcomings in prior work. Later, I make recommendations for addressing these limitations, and propose an improved model for visual storytelling: a hierarchical encoder-decoder network, with co-attention over the images and their natural language literal descriptions. I assess the performance of this model at generating visual stories. Finally, I experiment with a novel metric, BertScore (Zhang et al.,2019), as an alternative to human evaluation.
History
Advisor
Parde, NatalieChair
Parde, NatalieDepartment
Computer ScienceDegree Grantor
University of Illinois at ChicagoDegree Level
- Masters
Degree name
MS, Master of ScienceCommittee Member
Di Eugenio, Barbara Ravi, SathyaSubmitted date
May 2020Thesis type
application/pdfLanguage
- en