University of Illinois Chicago
Browse

Detecting Opinion Spam in Commercial Review Websites

Download (5.76 MB)
thesis
posted on 2016-07-01, 00:00 authored by Huayi Li
Review websites have become very important platforms for consumers to compare and evaluate products or services. However, review systems are often targeted by opinion spam. Although opinion spam detection has attracted significant research attention in recent years, the problem is far from being solved. One key reason is that there are no large-scale ground truth datasets available for model building and evaluation. Most existing approaches use pseudo fake reviews rather than real-world fake reviews. This dissertation presents my Ph.D. research on opinion spam detection in commercial review websites. Our dataset is shared by the largest review hosting site in China called Dianping. First, we present a large-scale analysis of restaurant reviews filtered by their fake review detection system. We discover several novel temporal and spatial patterns which demonstrate fundamental differences between spammers and non-spammers. Secondly, we found that the filtering system is of high precision but unknown recall. Thus we propose a Collective Positive and Unlabeled Learning framework to improve supervised learning and relational classification. Lastly, we come up with a new way of modeling reviewers' posting activities and propose a novel co-burst network which is superior to traditional reviewer-product network in detecting spammer groups. Through the above research works, we believe that our models can benefit Dianping's spam detection system as well as many other platforms. The findings and experimental results can not only detect but also prevent opinion spam by increasing the cost of spamming.

History

Advisor

Liu, Bing

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Doctoral

Committee Member

Buy, Ugo A. Kanich, Chris Yu, Philip S. Emery, Sherry L.

Submitted date

2016-05

Language

  • en

Issue date

2016-07-01

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC