posted on 2023-05-01, 00:00authored byMina Valizadeh
Medical self-disclosure is the communicative act of sharing personal information regarding medical symptoms, medications, diagnoses, or related content. Paradoxically, it may occur more frequently in online, potentially anonymous settings than in conversation with a trained physician. Disclosing health information may lead directly or indirectly to a variety of benefits, including earlier detection and treatment of medical issues that may have otherwise gone unaddressed; however, before benefits can be reaped, these disclosures must be recognized. Research towards detecting and analyzing online medical self-disclosure to date has been limited.
In this dissertation, we address this shortcoming by establishing the novel task of automatically detecting medical self-disclosure. We introduce an initial dataset of 6,639 health-related posts collected from online social platforms, annotated with graded (No Self-Disclosure, Possible Self-Disclosure, and Clear Self-Disclosure) labels pertaining to medical self-disclosure specifically. Our predictive model trained on this dataset achieves a classification accuracy of 76.77%, establishing a strong early performance benchmark for this task. Following our establishment of data collection and task validity through our preliminary experiments, we conduct comprehensive follow-up work to manually expand, automatically augment, and systematically analyze this dataset and task.
We publish a freely available, 3,919-instance expansion to our initial dataset comprising social media posts with clinically validated labels and high compatibility with the existing task-specific protocol. We also study the merits of pretraining task domain and text style by comparing Transformer-based models pretrained on a variety of general, medical, and social media sources and fine-tuned for this task. We find that a fine-tuned BERTweet model outperforms our earlier state-of-the-art by a substantial relative F-1 score increase of 16.73%. Finally, we investigate the role of other transfer learning and multitask learning paradigms in the context of medical self-disclosure. We also compare data augmentation techniques for this task, to assess the extent to which medical self-disclosure data may be further synthetically expanded. We find that this task poses many challenges for data augmentation techniques. We conclude by providing an in-depth analysis of identified trends, revealing exciting directions for follow-up work by others.
History
Advisor
Parde, Natalie
Chair
Parde, Natalie
Department
Computer Science
Degree Grantor
University of Illinois at Chicago
Degree Level
Doctoral
Degree name
PhD, Doctor of Philosophy
Committee Member
Di Eugenio, Barbara
Caragea, Cornelia
Ziebart, Brian
Khetani, Mary A