Open Classification and Change Detection in the Similarity Space

2017-10-27T00:00:00Z (GMT) by Geli Fei
The rapid emergence of new topics and the highly diverse nature of online text data have brought new challenges to existing text classification techniques. One of the main challenges is their lack of ability in handling unseen classes of documents due to the closed world assumption, under which all test classes are assumed to be known at training time. However, a more realistic scenario is to expect unseen classes during testing (open world). This problem is called open (world) classification. In this thesis, we start with studying three closely related research problems to open classification. First, we study the problem of text classification under negative covariate shift. Then we proceed to study the general problem of open (world) classification. Furthermore, we propose cumulative machine learning, where unseen classes of documents are not only detected, but also incorporated into the existing system in an efficient manner. One of the key techniques used in the above research is the transformation of documents to a similarity space to detect the special type of change in the test class distribution, i.e., the arrival of unseen classes. As the last part of this thesis, we explore the use of similarity-based approaches in detecting a new type of change in social media accounts. In particular, we study the problem of detecting changed-hands online review accounts. Extensive experiments have shown that the proposed approaches are highly effective.