Identifying Medical Self-Disclosure in Online Platforms

Valizadeh, Mina

doi:10.25417/uic.23661480.v1

Identifying Medical Self-Disclosure in Online Platforms

thesis

posted on 2023-05-01, 00:00 authored by Mina Valizadeh

Medical self-disclosure is the communicative act of sharing personal information regarding medical symptoms, medications, diagnoses, or related content. Paradoxically, it may occur more frequently in online, potentially anonymous settings than in conversation with a trained physician. Disclosing health information may lead directly or indirectly to a variety of benefits, including earlier detection and treatment of medical issues that may have otherwise gone unaddressed; however, before benefits can be reaped, these disclosures must be recognized. Research towards detecting and analyzing online medical self-disclosure to date has been limited. In this dissertation, we address this shortcoming by establishing the novel task of automatically detecting medical self-disclosure. We introduce an initial dataset of 6,639 health-related posts collected from online social platforms, annotated with graded (No Self-Disclosure, Possible Self-Disclosure, and Clear Self-Disclosure) labels pertaining to medical self-disclosure specifically. Our predictive model trained on this dataset achieves a classification accuracy of 76.77%, establishing a strong early performance benchmark for this task. Following our establishment of data collection and task validity through our preliminary experiments, we conduct comprehensive follow-up work to manually expand, automatically augment, and systematically analyze this dataset and task. We publish a freely available, 3,919-instance expansion to our initial dataset comprising social media posts with clinically validated labels and high compatibility with the existing task-specific protocol. We also study the merits of pretraining task domain and text style by comparing Transformer-based models pretrained on a variety of general, medical, and social media sources and fine-tuned for this task. We find that a fine-tuned BERTweet model outperforms our earlier state-of-the-art by a substantial relative F-1 score increase of 16.73%. Finally, we investigate the role of other transfer learning and multitask learning paradigms in the context of medical self-disclosure. We also compare data augmentation techniques for this task, to assess the extent to which medical self-disclosure data may be further synthetically expanded. We find that this task poses many challenges for data augmentation techniques. We conclude by providing an in-depth analysis of identified trends, revealing exciting directions for follow-up work by others.

History

Advisor

Parde, Natalie

Chair

Parde, Natalie

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Di Eugenio, Barbara Caragea, Cornelia Ziebart, Brian Khetani, Mary A

Submitted date

May 2023

Thesis type

application/pdf

Language

en

Usage metrics

Keywords

Machine Learning Deep Learning Natural Language Processing Medical Self-disclosure Detection

Licence

In Copyright

Identifying Medical Self-Disclosure in Online Platforms

History

Advisor

Chair

Department

Degree Grantor

Degree Level

Degree name

Committee Member

Submitted date

Thesis type

Language

Usage metrics

Categories

Keywords

Licence

Exports