Probabilistic Models for Fine-Grained Opinion Mining: Algorithms and Applications
MetadataShow full item record
Public sentiments in online debates, discussions, comments are crucial to governmental agencies for passing new bills/policy, gauging upheaval, predicting elections, etc. However, to leverage the sentiments expressed in social opinions, we face two major challenges: (1) fine-grained opinion mining, and (2) filtering opinion spam to ensure credible opinion mining. We start with mining opinions from social conversations. We focus on fine-grained sentiment dimensions like agreement (I’d agree), disagreement (I refute). This is a major departure from the traditional polar (positive/negative) sentiments (e.g., good, nice vs. poor, bad) in standard opinion mining. In the domain of debates, joint topic and sentiment models are proposed to discover disagreement and agreement expressions, and contention points/topics both at the discussion level and also at the individual post level. Proposed models also encode interactions among discussants through quoting and replying relations.. Next, we address the problem of semantic incoherence in aspect extraction by knowledge induction using seeds. Seeds are certain user defined coarse groupings which guide the modeling process. Specifically, we build over topic models to propose novel aspect specific sentiment models guided by aspect seeds. The later part of this thesis proposes solutions for detecting opinion spam. Opinion spam refers to “illegitimate” human activities (e.g., writing fake reviews) that try to mislead readers by giving undeserving opinions/ratings to some entities (e.g., hotels, products) to promote/demote them. We address two problems in opinion spam. First is the problem of group spam, i.e., a group of spammers working in collusion. A novel relational ranking algorithm called GSRank is proposed for ranking spam groups based on mutual-reinforcement. The second problem is opinion spam detection in the absence of labeled data. The situation is important as it is hard and erroneous to manually label fake reviews or reviewers. Our solution is based on the hypothesis that spammers differ markedly from others on behavioral dimensions which creates a distributional divergence between two (latent) population clusters: spammers and non-spammers. Modeling spamicity of users as “latent” with observed behavioral footprints, novel generative models are proposed for detecting opinion spam/fraud.
Natural language processing