Two Applications of Topic Models

Wang, Jing

Two Applications of Topic Models

thesis

posted on 2015-10-21, 00:00 authored by Jing Wang

Topic modeling algorithms promise to uncover the underlying semantics of large collections of documents, serving as an effective tool to help discover online knowledge. In this thesis, we apply topic modeling algorithms to solve two major tasks. The first relates to identify diversionary comments under blog posts. Diversionary comments are defined as comments that divert the topic from the original post. A possible purpose is to distract readers from the original topic and draw attention to a new one. We categorize diversionary comments into 5 types based on our observations, and propose an effective framework to identify and flag them. Our approach combines coreference resolution, extraction from Wikipedia and topic modeling algorithms to help capture the underlying topics in each comment and the post. We solve the problem in two different ways: (i) rank all the comments in descending order of being diversionary; (ii) consider it as a classification problem, distinguishing diversionary comments from non-diversionary ones. Secondly, we design a sense-topic model to induce the senses for ambiguous words in a corpus. Considering that sense and topic are related, but they are distinct linguistic phenomena, we treat sense and topic as two separate latent variables in our model. Topics are inferred by the entire document, while senses are inferred by the local context surrounding the ambiguous word. When relating the sense and topic variables, we take inspiration from dependency networks and draw a bidirectional edge between them. We also present unsupervised ways of enriching the original dataset, including using neural word embeddings and external Web-scale corpora to enrich the context of each data instance or to add more instances.

History

Advisor

Yu, Clement

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Doctoral

Committee Member

Yu, Philip Liu, Bing Ziebart, Brian Martin, Ryan

Submitted date

2015-08

Language

en

Issue date

2015-10-21

Usage metrics

Keywords

topic modeling algorithm Latent Dirichlet Allocation diversionary comments word sense induction sense-topic model

Licence

In Copyright

Two Applications of Topic Models

History

Advisor

Department

Degree Grantor

Degree Level

Committee Member

Submitted date

Language

Issue date

Usage metrics

Categories

Keywords

Licence

Exports