University of Illinois Chicago
Browse

A set of PubMed Case Series articles to serve as a training corpus for automated machine learning indexing methods

Version 2 2025-03-19, 21:33
Version 1 2025-03-17, 21:15
dataset
posted on 2025-03-19, 21:33 authored by Neil SmalheiserNeil Smalheiser, Andrew Shahidehpour

We characterized the PubMed articles that mention “case series” in the title or abstract (published 01/01/1987 - 12/31/2023, written in English). We removed articles which discuss (rather than report the results of) case series studies, as well as those better indexed as other standard publication types. A random sample of these articles was evaluated by two annotators who confirmed that the great majority satisfy a formal definition of “case series”. The endpoint is a corpus of case series studies, listed by their PMIDs, that is suitable to use as a training set for automated machine learning indexing methods.

A manuscript describing the corpus in detail is forthcoming soon.

Funding

NIH grant 1R01LM014292-01

History

Language

  • en_US

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC