University of Illinois Chicago
Browse

Towards an Improved Model for Visual Storytelling

Download (4.21 MB)
thesis
posted on 2020-05-01, 00:00 authored by Yatri Manoj Modi
Visual storytelling is an intriguing and complex task that only recently entered the language and vision research arena. The task focuses on generating human-like, coherent and visually grounded stories from a sequence of images while maintaining the context over these images. In this study I survey recent advances in the field and conduct a thorough error analysis of three approaches to visual storytelling. I categorize and provide examples of common types of errors, and identify key shortcomings in prior work. Later, I make recommendations for addressing these limitations, and propose an improved model for visual storytelling: a hierarchical encoder-decoder network, with co-attention over the images and their natural language literal descriptions. I assess the performance of this model at generating visual stories. Finally, I experiment with a novel metric, BertScore (Zhang et al.,2019), as an alternative to human evaluation.

History

Language

  • en

Advisor

Parde, Natalie

Chair

Parde, Natalie

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Masters

Degree name

MS, Master of Science

Committee Member

Di Eugenio, Barbara Ravi, Sathya

Submitted date

May 2020

Thesis type

application/pdf

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC