Debiasing 2016 Twitter Election Analysis via Multi-Level Regression and Poststratification (MRP)
thesisposted on 2019-08-06, 00:00 authored by Shoaib Khan
Sentiment analysis of social media has become very heavily utilized to analyze views of the general population. However, according to Pew research, social media users are not a representative sample of the US population. Such flaws can bias the results of any analysis. Very few studies have attempted to account for demographic biases among Twitter users. By combining approaches from computer science and statistics, we propose a simple but powerful two-step approach to address this gap. The first step predicts the demographic attributes and sentiment of social media users based on their follower networks and tweets. The second step employs multilevel regression and post-stratification (MRP), a well-known statistics approach for debiasing data, to predict the actual proportion of the population holding a particular view. With predicting poll results for key states during the 2016 US presidential election as a case study, we show that social media can make predictions similar to poll.