Open Classification and Change Detection in the Similarity Space

Fei, Geli

FEI-DISSERTATION-2017.pdf (1.17 MB)

Open Classification and Change Detection in the Similarity Space

thesis

posted on 2017-10-27, 00:00 authored by Geli Fei

The rapid emergence of new topics and the highly diverse nature of online text data have brought new challenges to existing text classification techniques. One of the main challenges is their lack of ability in handling unseen classes of documents due to the closed world assumption, under which all test classes are assumed to be known at training time. However, a more realistic scenario is to expect unseen classes during testing (open world). This problem is called open (world) classification. In this thesis, we start with studying three closely related research problems to open classification. First, we study the problem of text classification under negative covariate shift. Then we proceed to study the general problem of open (world) classification. Furthermore, we propose cumulative machine learning, where unseen classes of documents are not only detected, but also incorporated into the existing system in an efficient manner. One of the key techniques used in the above research is the transformation of documents to a similarity space to detect the special type of change in the test class distribution, i.e., the arrival of unseen classes. As the last part of this thesis, we explore the use of similarity-based approaches in detecting a new type of change in social media accounts. In particular, we study the problem of detecting changed-hands online review accounts. Extensive experiments have shown that the proposed approaches are highly effective.

History

Advisor

Liu, Bing

Chair

Liu, Bing

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Doctoral

Committee Member

Di Eugenio, Barbara Gmytrasiewicz, Piotr Yu, Philip S Mahmud, Jalal

Submitted date

May 2017

Issue date

2017-03-20

Usage metrics

Keywords

Open classification Covariate shift Cumulative learning Spam detection Change detection

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Open Classification and Change Detection in the Similarity Space

History

Advisor

Chair

Department

Degree Grantor

Degree Level

Committee Member

Submitted date

Issue date

Usage metrics

Categories

Keywords

Licence

Exports