Detecting Opinion Spam in Commercial Review Websites

Li, Huayi

Detecting Opinion Spam in Commercial Review Websites

thesis

posted on 2016-07-01, 00:00 authored by Huayi Li

Review websites have become very important platforms for consumers to compare and evaluate products or services. However, review systems are often targeted by opinion spam. Although opinion spam detection has attracted significant research attention in recent years, the problem is far from being solved. One key reason is that there are no large-scale ground truth datasets available for model building and evaluation. Most existing approaches use pseudo fake reviews rather than real-world fake reviews. This dissertation presents my Ph.D. research on opinion spam detection in commercial review websites. Our dataset is shared by the largest review hosting site in China called Dianping. First, we present a large-scale analysis of restaurant reviews filtered by their fake review detection system. We discover several novel temporal and spatial patterns which demonstrate fundamental differences between spammers and non-spammers. Secondly, we found that the filtering system is of high precision but unknown recall. Thus we propose a Collective Positive and Unlabeled Learning framework to improve supervised learning and relational classification. Lastly, we come up with a new way of modeling reviewers' posting activities and propose a novel co-burst network which is superior to traditional reviewer-product network in detecting spammer groups. Through the above research works, we believe that our models can benefit Dianping's spam detection system as well as many other platforms. The findings and experimental results can not only detect but also prevent opinion spam by increasing the cost of spamming.

History

Advisor

Liu, Bing

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Doctoral

Committee Member

Buy, Ugo A. Kanich, Chris Yu, Philip S. Emery, Sherry L.

Submitted date

2016-05

Language

en

Issue date

2016-07-01

Usage metrics

Keywords

Opinion Spam Fake Reviews Classification Data Mining

Licence

In Copyright

Detecting Opinion Spam in Commercial Review Websites

History

Advisor

Department

Degree Grantor

Degree Level

Committee Member

Submitted date

Language

Issue date

Usage metrics

Categories

Keywords

Licence

Exports