posted on 2023-05-01, 00:00authored bySouvik Bhattacharya
The spread of fake news can have devastating ramifications, and recent advancements to neural fake news generators have made it challenging to understand how misinformation generated by these models may best be confronted. We conduct a feature-based study to gain an interpretative understanding of the linguistic attributes that neural fake news generators may most successfully exploit. When comparing models trained on subsets of our features and confronting the models with increasingly advanced neural fake news, we find that stylistic features may be the most robust. We have used five separate publicly available datasets. They are the BuzzFeed Real News Dataset, the BuzzFeed Fake News Dataset, the PolitiFact Real News Dataset, the Telling a Lie Corpus and the PolitiFact Fact Check Dataset. We discuss our findings, subsequent analyses, and broader implications.