Predicting Pathological Effect of Mutation and Identifying Cancer Driver Event Based on Protein Structure
thesis
posted on 2023-08-01, 00:00authored byBoshen Wang
Identifying cancer driver mutations is
critical for understanding molecular mechanisms triggering tumorigenesis,
and for designing targeted treatments in precision oncology.
The majority of existing in-silico methods focus on
predicting cancer driver genes, but are limited as they cannot
identify specific sites where mutations drive tumorigenesis from a noisy mutational background landscape. Among the 2.94 millions missense
mutations from COSMIC cancer samples, over 94% have low recurrence (<3)
and over 50% are predicted to exhibit pathogenic effects by various
bioinformatics methods. As no further information is provided, it is challenging
to determine which mutations are driving tumorigenesis using current methods that are frequency-based or mutation effect-based approaches.
In this study, we develop a new method called Structure-CAncer-Relationship-on-Pathogenicity (SCARP)
to identify cancer driver mutations by systematically integrating mutation effects,
co-clustering effects of spatial regions near the mutation sites, as well as mutation
recurrence. First, we use our novel Structure-Pathogenicity Relationship Identifier (SPRI)
method to estimate the
likelihood of pathogenicity of a specified mutation, as SPRI captures
essential biological properties from structural, biophysical, and evolutionary features,
and exhibits favorable performance to identify deleterious mutations on the
ground truth of Mendelian disease-type mutations
compared with multiple state-of-the-art methods. Furthermore, it demonstrates great
transferability in distinguishing cancer driver mutations from passenger mutations. Second, we quantify the influence of co-clustering mutations in the
structural neighborhood regions of the mutation site, as biological
functions often require specific structural arrangements of residues.
Third, we utilize mutation recurrence collected from pan-cancer or tissue-specific cancer
cohorts. We show our method can effectively identify cancer driver genes and provides
detailed rankings of pathogenicity of the mutation sites. Our results show that accurate recognition of co-clustering mutational effects is
important for predicting site-specific cancer driver events.