University of Illinois Chicago
Browse

Data Mining of High Dimensional Sparse Dataset: A Case Study of Nursing Electronic Health Records

Download (4.59 MB)
thesis
posted on 2017-10-27, 00:00 authored by Muhammad K Lodhi
With the rapid growth of electronic data repositories in diverse application domains, considerable research interest has been developed to solve issues related to extraction of hidden knowledge in these repositories. Among these application domains, healthcare data repositories in the form of electronic health record systems (EHRs) are the fastest growing in terms of size and data diversity. Currently, EHRs are mostly being used to monitor the progress of patients, however, the real value of these systems is in the knowledge hidden in the data in the form of best or not so best practices. One of the challenges in the analysis of data in EHR is the high dimensionality and sparseness of the data. This also makes EHRs perfect research vehicles to investigate Big Data issues, such as storage and retrieval, as well as development of analytics techniques and decision making tools. This dissertation focuses on the development of analysis and knowledge discovery techniques for high dimensional sparse data while using nursing care data as an exemplar. The medical data in general, and nursing data in particular, is also acknowledged for its complexity due to its variety and diversity of associated standards. A noteworthy gap in the literature is that only a few studies have focused on utilizing EHRs to improve quality of care for patients diagnosed with different illnesses. We note that mining of a high-dimensional and sparse dataset is a challenging task. While there are several dimension reduction methods proposed in the literature, however they do not work well with contextual datasets such EHRs. In our dissertation, we have explored the use of association mining to realize dimension reduction and for extracting important features from the dataset. We propose an analysis and knowledge discovery framework for healthcare data that is later applied for analysis and prediction of different outcomes related to healthcare providers, healthcare administrators, and patients. The resulting predictive models can be used to determine most effective treatments for individual patients and can also be used to standardize these treatments. In our work we have used the proposed framework for nursing diagnosis such as death anxiety, anticipatory grieving, and cancer among others. Our results show, for example, that younger patients diagnosed with death anxiety had a lower chance of meeting their expected outcome, when compared with the older patients. We also discovered that patients diagnosed with anticipatory grieving, also suffering from physical pain had a lower probability of meeting their expected outcome, as compared to those patients that did not suffer from pain. Based on this framework, we have also devised a hierarchical learning method to classify patients that are at-risk of re-hospitalization within one month of discharge, as re-admission rates are increasingly being used as a benchmark to determine the quality of healthcare provided to the hospitalized patients. We also determine issues that trigger re-admission using different predictive models. In general, our predictive modeling results show that decision tree models have high accuracy and the results are easy to interpret and determine the influence of different variables.

History

Advisor

Kshemkalyani, Ajay

Chair

Kshemkalyani, Ajay

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Doctoral

Committee Member

Khokhar, Ashfaq DiEugenio, Barbara Johnson, Andrew Keenan, Gail Wilkie, Diana

Submitted date

May 2017

Issue date

2017-03-16

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC