posted on 2024-08-01, 00:00authored byMohammad Arvan
Recent years have witnessed substantial growth in Machine Learning (ML) and Natural
Language Processing (NLP), largely fueled by the accessibility and openness of data and
models—a cornerstone of Open Science. This dissertation builds on this foundation by
integrating additional principles of Open Science—transparency, scrutiny, critique, and
reproducibility—into the study of these fields.
The dissertation extensively explores and tackles the challenges in reproducibility across
both automatic and human evaluations in ML. It begins by unraveling the hidden complexities
in evaluating uncertainty, emphasizing the necessity of rigorous statistical analysis, which
include effect size and power analysis, and acknowledges the persistent risks of false
discoveries despite careful considerations. This is complemented by a comprehensive guide
to conducting and reporting uncertainties in evaluations, presenting a crucial resource for
researchers to enhance the reliability of their findings.
Further dissecting reproducibility challenges, we investigate the trends in availability
of research artifacts and examines the impact of community-driven initiatives aimed at
improving reporting practices. Furthermore, we present reproducibility assessment of eight
scientific papers. Despite certain improvements spurred by community-driven initiatives for
better reporting practices, there remain major issues that hinder reproducibility. An in-depth
case study on the reproducibility of a text simplification pipeline reveals several overlooked
reproducibility challenges such as bugs and dependency issues. Reproducibility of human
evaluations is also scrutinized through two case studies. After observing mixed results, we
identify several factors that contribute to inconsistencies in human evaluations, including
small sample sizes and dynamic conditions. Through these analyses, the dissertation
underscores the ongoing challenges in achieving reproducibility in ML and NLP, offering
insights to bolster the reliability of future research within these dynamic fields.
History
Advisor
Natalie Parde
Department
Computer Science
Degree Grantor
University of Illinois Chicago
Degree Level
Doctoral
Degree name
Doctor of Philosophy
Committee Member
bdieugen@uic.edu, Barbara Di Eugenio
Xinhua Zhang
Luis Gabriel Ganchi nho de Pina
Ehud Reiter