Explainable Predictive Modeling and Synthetic Sample Generation for Limited Spectral Data
thesisposted on 2022-08-01, 00:00 authored by Frantishek Akulich
To assess and optimize the performance of combustion systems, it is necessary to characterize fuel ignition quality. Acquiring ignition properties of fuels is a tedious process that involves sample preparation and ignition quality testing. With advances in alternate approaches to defining fuels, particularly through a digital spectroscopic signature, a new possibility to simplify and automate the process of getting such knowledge has emerged. In this work, we harness automated statistical learning to map between the underlying chemical properties of fuel and the spectroscopic data it is im- printed on. Ensuring that such mapping is accurate and interpretable, it establishes a pathway for efficiently and totally automating the extraction of various attributes from any fuel. However, the high-dimensional nature of spectroscopic data, its scarcity, and the noise associated with the data collection process are key roadblocks to accomplish such task. In this research, we address these issues by integrating machine learning predictive modeling, interpretable feature selection techniques, and synthetic data generation, respectively. In the first part, we investigate the most commonly used feature selection techniques and adopt the most recent and advanced explainable AI techniques to interpret the prediction outcomes of high-dimensional and limited spectral data. Interpretation of the prediction outcome is beneficial for the domain experts as it ensures the transparency and faithfulness of the ML models to the domain knowledge. Due to the instrument resolution limitations, pinpointing important regions of the spectroscopic data creates a pathway to optimize the data collection pro- cess through the spectrometer device miniaturization. Reducing the device size and power, and hence, cost is essential for a real-world deployment of such an end-to-end system. Furthermore, we consider a wide range of machine learning models that have been proven to be successful for the prediction of the Cetane Number of fuels. We specifically design three different scenarios to ensure that the evaluation of ML models is robust for the real-time practice of the developed methodologies and to uncover the hidden effect of various noise sources (statistical and from data collection) on the final outcome. The evaluation is performed for both the full model and reduced models using different feature selection techniques on a real dataset. In the second part, we devise a deep generative technique to produce high fidelity and high diversity synthetic spectroscopy samples learned from the original dataset to expand our limited data pool and improve the representation. Our developed GAN model is then evaluated using statistical similarity, prediction model efficacy, and domain-expert conformance metrics. The results indicates tangible improvement in prediction model generalization ability for unforeseen data. To further enhance the transparency of the entire process, we employ GAN to produce samples of a specific group of pure alkane mixtures and compare them to the expected output. We demonstrate that our data synthesis approach can learn and reproduce spectroscopic samples that have the physical attributes of real fuels despite being artificial.
DepartmentMechanical and Industrial Engineering
Degree GrantorUniversity of Illinois at Chicago
Degree nameMS, Master of Science
Committee MemberBrezinsky, Kenneth Lynch, Patrick Darabi, Houshang
Submitted dateAugust 2022