Recent advancements in proteomics have improved protein identifications, but a significant portion of the proteome remains undetected, offering opportunities for exploring uncovered regions of proteome that may unveil valuable information such as post-translational modifications and protein isoforms. This study delves into the dark side of the proteome using various approaches. On the experimental side, a novel time-lapsed digestion method is developed to enhance sequence coverage and provide structural insights on extracellular matrix (ECM) proteome. The ECM plays a vital role in cellular processes and its dysregulation is associated with diseases such as fibrosis and cancer. While mass spectrometry-based proteomics has been instrumental in profiling ECM protein composition, limitations exist in capturing the dynamic range of protein abundance and obtaining structural information. The optimized time-lapsed proteomics method in our study surpasses traditional workflows in terms of sequence coverage, even outperforming traditional offline fractionation.
On the informatics side, a proteomics knowledge database, MatrisomeDB, is developed to enhance sequence coverage and serves as a user-friendly web platform for the ECM research community. MatrisomeDB focuses solely on ECM proteomics, curating data from various ECM proteomics studies and providing direct quantitative comparison between projects. The database includes a comprehensive collection of ECM proteoforms and achieves near-complete sequence coverage of the predicted matrisome.
Additionally, we utilize native limited proteolysis to validate protein structures predicted by AlphaFold. This is facilitated by a web application that visualizes sequence coverage on 3D protein structures. Protein structure determination has traditionally been challenging, but recent breakthroughs in deep learning-based protein prediction, such as AlphaFold, have shown promising results. However, the accuracy of predictions for proteins with unknown structures and domains still requires validation. The presented method using native limited proteolysis and visualization tool offers a means to validate predictions and potentially offer an opportunity to refine predicted protein structures.
Overall, this study explores the darks side of proteome and improves proteome sequence coverage by developing and implementing a novel time lapsed digestion approach, and a searchable proteomics knowledge database. Last, as an application, we present a method to validate predicted AlphaFold protein structures with native limited proteolysis.
History
Advisor
Gao, Yu
Chair
Gao, Yu
Department
Pharmaceutical Sciences
Degree Grantor
University of Illinois at Chicago
Degree Level
Doctoral
Degree name
PhD, Doctor of Philosophy
Committee Member
Mankin, Alexander Shura
Federle, Michael
Burdette, Joanna
Naba, Alexandra