University of Illinois at Chicago
Browse

File(s) under embargo

1

year(s)

6

month(s)

9

day(s)

until file(s) become available

Integrating Image Features to Support the Biocuration Workflow

thesis
posted on 2023-12-01, 00:00 authored by Juan Trelles Trabucco
Images in scientific publications communicate essential information, such as experiments performed and results obtained, that researchers can use as a proxy for the publication’s relevance to their research interests. Domains like biocuration, where biomedical experts analyze the literature to extract and organize information for future retrieval and analysis, can benefit from computationally leveraging these images in their workflows. Prior work in biomedical document classification for biocuration has shown that combining features from images (e.g., acquisition modality) with text features can improve precision and recall. However, the considerations and benefits obtained from computationally integrating such image-based features in biocuration workflows are less clear. Based on a multi-year collaboration with text-mining researchers and biocurators, this dissertation identifies and tackles several challenges to computationally using image-data in the biocuration process. These challenges include the scarcity of labeled datasets for training image classifiers, the lack of characterization for the needed modality classes in biocuration, the lack of available labeling systems to label extensive collections of images, and the lack of support for image-based features in academic search engines. This dissertation addresses these issues by proposing two taxonomies for image modalities found in biomedical publications; introducing training strategies using shallower neural networks and hierarchical organization of classifiers; describing two labeling systems, one for domain experts and one for model builders who require support for data understanding; and concludes by merging several building blocks into document search systems that successfully leverage image-based data.

History

Advisor

Georgeta Elisabeta Marai

Department

Computer Science

Degree Grantor

University of Illinois Chicago

Degree Level

  • Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Andrew Johnson Wei Tang Cecilia Arighi Steven M. Drucker

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC