University of Illinois Chicago
Browse

Open Classification and Change Detection in the Similarity Space

Download (1.17 MB)
thesis
posted on 2017-10-27, 00:00 authored by Geli Fei
The rapid emergence of new topics and the highly diverse nature of online text data have brought new challenges to existing text classification techniques. One of the main challenges is their lack of ability in handling unseen classes of documents due to the closed world assumption, under which all test classes are assumed to be known at training time. However, a more realistic scenario is to expect unseen classes during testing (open world). This problem is called open (world) classification. In this thesis, we start with studying three closely related research problems to open classification. First, we study the problem of text classification under negative covariate shift. Then we proceed to study the general problem of open (world) classification. Furthermore, we propose cumulative machine learning, where unseen classes of documents are not only detected, but also incorporated into the existing system in an efficient manner. One of the key techniques used in the above research is the transformation of documents to a similarity space to detect the special type of change in the test class distribution, i.e., the arrival of unseen classes. As the last part of this thesis, we explore the use of similarity-based approaches in detecting a new type of change in social media accounts. In particular, we study the problem of detecting changed-hands online review accounts. Extensive experiments have shown that the proposed approaches are highly effective.

History

Advisor

Liu, Bing

Chair

Liu, Bing

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Doctoral

Committee Member

Di Eugenio, Barbara Gmytrasiewicz, Piotr Yu, Philip S Mahmud, Jalal

Submitted date

May 2017

Issue date

2017-03-20

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC