The rapid emergence of new topics and the highly diverse nature of online text data have brought new challenges to existing text classification techniques. One of the main challenges is their lack of ability in handling unseen classes of documents due to the closed world assumption, under which all test classes are assumed to be known at training time. However, a more realistic scenario is to expect unseen classes during testing (open world). This problem is called open (world) classification. In this thesis, we start with studying three closely related research problems to open classification. First, we study the problem of text classification under negative covariate shift. Then we proceed to study the general problem of open (world) classification. Furthermore, we propose cumulative machine learning, where unseen classes of documents are not only detected, but also incorporated into the existing system in an efficient manner. One of the key techniques used in the above research is the transformation of documents to a similarity space to detect the special type of change in the test class distribution, i.e., the arrival of unseen classes. As the last part of this thesis, we explore the use of similarity-based approaches in detecting a new type of change in social media accounts. In particular, we study the problem of detecting changed-hands online review accounts. Extensive experiments have shown that the proposed approaches are highly effective.
History
Advisor
Liu, Bing
Chair
Liu, Bing
Department
Computer Science
Degree Grantor
University of Illinois at Chicago
Degree Level
Doctoral
Committee Member
Di Eugenio, Barbara
Gmytrasiewicz, Piotr
Yu, Philip S
Mahmud, Jalal