University of Illinois at Chicago
Browse
PIRAS-THESIS-2018.pdf (575.63 kB)

Detection of Suspicious Users Posting Claims about Cancer on Twitter

Download (575.63 kB)
thesis
posted on 2018-11-27, 00:00 authored by Massimo Piras
Due to the massive success of social media, online user-generated content has increased exponentially in the last years. Twitter, as a microblogging platform, allows users to share information about their opinions or activities by means of short posts called tweets. However, opinion spammers see social networks like Twitter as an opportunity to propagate their ideas, promoting or discrediting some target product or service, without showing their true intentions. In this study, we focused on detecting suspicious users who posted dubious claims about cancer treatment and prevention on Twitter. We addressed the task with a supervised learning approach, a binary classification problem in which we had to predict whether users were suspicious or genuine. We collected a set of 60 thousand tweets related to cancer posted in October 2017, including more than 36 thousand users. Since manual labeling could be a very complicated process, we elaborated a set of features for each user, both related to the content of her posts and her behavior on Twitter, and combined them to compute a spam score. The basic idea was that suspicious users would have different feature distributions with respect to genuine users and that would help us to separate the two classes. Then, we generated a ranking using the spam score and exploited it to assign the labels. Finally, we ran a few classifiers on our labeled data, showing that suspicious users had different textual and behavioral patterns which could be used to distinguish them from genuine ones.

History

Advisor

Liu, Bing

Chair

Liu, Bing

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Masters

Committee Member

Di Eugenio, Barbara Baralis, Elena

Submitted date

August 2018

Issue date

2018-08-09

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC