Diversionary Comments under Blog Posts.

Wang, J; Yu, CT; Yu, PS; Liu, B; Meng, WY

Diversionary Comments under Blog Posts.

journal contribution

posted on 2016-05-04, 00:00 authored by J Wang, CT Yu, PS Yu, B Liu, WY Meng

There has been a recent swell of interest in the analysis of blog comments. However, much of the work focuses on detecting comment spam in the blogsphere. An important issue that has been neglected so far is the identification of diversionary comments. Diversionary comments are defined as comments that divert the topic from the original post. A possible purpose is to distract readers from the original topic and draw attention to a new topic. We categorize diversionary comments into five types based on our observations and propose an effective framework to identify and flag them. To the best of our knowledge, the problem of detecting diversionary comments has not been studied so far. We solve the problem in two different ways: (i) rank all comments in descending order of being diversionary and (ii) consider it as a classification problem. Our evaluation on 4,179 comments under 40 different blog posts from Digg and Reddit shows that the proposed method achieves the high mean average precision of 91.9% when the problem is considered as a ranking problem and 84.9% of F-measure as a classification problem. Sensitivity analysis indicates that the effectiveness of the method is stable under different parameter settings.

Funding

This work was partially supported by NSF through grant CNS-1115234, IIS-1407927, Google Research Award, and the Pinnacle Lab at Singapore Management University.

History

Publisher Statement

This is a non-final version of an article published in final form in Wang, J., Yu, C. T., Yu, P. S., Liu, B. and Meng, W. Y. Diversionary Comments under Blog Posts. Acm Transactions on the Web. 2015. 9(4). DOI: 10.1145/2789211.

Publisher

Association for Computing Machinery (ACM)

issn

1559-1131

Issue date

2015-10-01

Usage metrics

Keywords

Diversionary comments spam topic model latent dirichlet allocation (LDA)hierarchical dirichlet process(HDP)coreference resolution extraction from Wikipedia ranking classification

Licence

In Copyright

Diversionary Comments under Blog Posts.

Funding

This work was partially supported by NSF through grant CNS-1115234, IIS-1407927, Google Research Award, and the Pinnacle Lab at Singapore Management University.

History

Publisher Statement

Publisher

issn

Issue date

Usage metrics

Categories

Keywords

Licence

Exports