Review websites have become very important platforms for consumers to compare and evaluate products or services. However, review systems are often targeted by opinion spam. Although opinion spam detection has attracted significant research attention in recent years, the problem is far from being solved. One key reason is that there are no large-scale ground truth datasets available for model building and evaluation. Most existing approaches use pseudo fake reviews rather than real-world fake reviews. This dissertation presents my Ph.D. research on opinion spam detection in commercial review websites. Our dataset is shared by the largest review hosting site in China called Dianping. First, we present a large-scale analysis of restaurant reviews filtered by their fake review detection system. We discover several novel temporal and spatial patterns which demonstrate fundamental differences between spammers and non-spammers. Secondly, we found that the filtering system is of high precision but unknown recall. Thus we propose a Collective Positive and Unlabeled Learning framework to improve supervised learning and relational classification. Lastly, we come up with a new way of modeling reviewers' posting activities and propose a novel co-burst network which is superior to traditional reviewer-product network in detecting spammer groups. Through the above research works, we believe that our models can benefit Dianping's spam detection system as well as many other platforms. The findings and experimental results can not only detect but also prevent opinion spam by increasing the cost of spamming.
History
Advisor
Liu, Bing
Department
Computer Science
Degree Grantor
University of Illinois at Chicago
Degree Level
Doctoral
Committee Member
Buy, Ugo A.
Kanich, Chris
Yu, Philip S.
Emery, Sherry L.