posted on 2022-08-01, 00:00authored byAbhinav Kumar
Visualization is a core component of exploring data, providing a guided search through the data to gain meaningful insight and verify hypotheses. However it is also cognitively demanding to construct visualizations particularly for users that are novice to visualization. Such users tend to struggle translating hypotheses and relevant questions into data attributes, only select familiar plot types such as bar, line, and pie charts, and can misinterpret visualizations once they are created. Although visualization tools can alleviate these problems by performing some of the decision making for constructing the visualization, the software interface for these tools requires a steep learning curve. Rooted on the commercial success of natural language processing in technologies such as iPhone, Android, and Alexa, there has been an increased focus on building natural language interfaces to assist the user in data exploration. The broader goal of our research is towards developing a natural language interface implemented using dialogue system architecture capable of conversing with humans through language and hand pointing gestures.
In this thesis, we highlight our approach to modeling the dialogue. In particular, we built a multi-modal dialogue corpus for exploring data visualizations, capturing user speech and hand gestures during the conversation. As part of this work we also developed the natural language interface capable of processing speech and gesture using multimodal interfaces. Evaluation of our visualization system shows it is effective in interpreting speech and pointing gestures as well as generating appropriate visualizations with real subjects in a live environment. We also focused on advanced models for language processing. We modeled user intent by informing our language interpretation module of the nearby utterances with respect to the current request to give it context for the interpretation. We also addressed the small size of our corpus by developing a paraphrasing approach to augment its size, leading to improvement in predicting the user intent. Since the user can switch between request and thinking aloud we also modeled the ability to differentiate between them and used that to segment incoming utterances into the request and surrounding think aloud. Finally, we developed an approach to recognize visualization-referring language and pointing gestures and resolve them to the appropriate target visualization on the screen, as well as create new visualizations using the referred-to visualization as a template. We performed intrinsic evaluation of these individual models as well as an incremental evaluation of the end-to-end system, indicating effectiveness in predicting user intent as well as recognizing and resolving visualization-referring language and pointing gestures.
History
Advisor
Eugenio, Barbara Di
Chair
Eugenio, Barbara Di
Department
Department of Computer Science
Degree Grantor
University of Illinois at Chicago
Degree Level
Doctoral
Degree name
PhD, Doctor of Philosophy
Committee Member
Johnson, Andrew
Parde, Natalie
Caragea, Cornelia
Georgila, Kallirroi