posted on 2015-02-27, 00:00authored byJalal S. Alowibdi
Online Social Networks (OSNs) play a significant role in the daily life
of hundreds of millions of people's. However, many user profiles in
OSNs contain deceptive information. Existing studies have shown that
lying in OSNs is quite widespread, often for protecting a user's
privacy. In this dissertation, we propose a novel approach for detecting
deceptive profiles in OSNs. Our ultimate goal is to find deceptive
information about user gender and location. We specifically define a
set of analysis methods for detecting deceptive information about user
genders and locations in Twitter. First, we collected a large dataset
of Twitter profiles and tweets. Next, we defined methods for gender
guessing from Twitter profile colors and names. Our methods are quite
scalable because we avoid the analysis of text messages, which typiclly
involves high computational complexity. We applied a number of
preprocessing methods to raw Twitter data in ways that significantly
enhanced the accuracy of our predictions. Subsequently, we applied
Bayesian classification and K-means clustering algorithms to Twitter
profile characteristics (e.g., profile layout colors, first names, user
names) and geolocations to analyze user behavior. We established the
overall accuracy of each gender indicator through extensive
experimentations with our crawled dataset.