INDIGO Home University of Illinois at Urbana-Champaign logo uic building uic pavilion uic student center

An improved machine learning protocol for the identification of correct Sequest search results

Show simple item record

Bookmark or cite this item:

Files in this item

File Description Format
PDF 1471-2105-11-591.pdf (456KB) (no description provided) PDF
Title: An improved machine learning protocol for the identification of correct Sequest search results
Author(s): Källberg, Morten; Lu, Hui
Subject(s): Mass spectrometry tandem mass spectrometry
Abstract: Background: Mass spectrometry has become a standard method by which the proteomic profile of cell or tissue samples is characterized. To fully take advantage of tandem mass spectrometry (MS/MS) techniques in large scale protein characterization studies robust and consistent data analysis procedures are crucial. In this work we present a machine learning based protocol for the identification of correct peptide-spectrum matches from Sequest database search results, improving on previously published protocols. Results: The developed model improves on published machine learning classification procedures by 6% as measured by the area under the ROC curve. Further, we show how the developed model can be presented as an interpretable tree of additive rules, thereby effectively removing the ‘black-box’ notion often associated with machine learning classifiers, allowing for comparison with expert rule-of-thumb. Finally, a method for extending the developed peptide identification protocol to give probabilistic estimates of the presence of a given protein is proposed and tested. Conclusions: We demonstrate the construction of a high accuracy classification model for Sequest search results from MS/MS spectra obtained by using the MALDI ionization. The developed model performs well in identifying correct peptide-spectrum matches and is easily extendable to the protein identification problem. The relative ease with which additional experimental parameters can be incorporated into the classification framework, to give additional discriminatory power, allows for future tailoring of the model to take advantage of information from specific instrument set-ups.
Issue Date: 2010-12-07
Publisher: BioMed Central
Citation Info: Kallberg, M. & Lu, H. 2010. An improved machine learning protocol for the identification of correct Sequest search results. BMC Bioinformatics, 11: 591. DOI: 10.1186/1471-2105-11-591
Type: Article
Description: © 2010 Källberg and Lu; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The original works is available through BioMed Central at DOI: 10.1186/1471-2105-11-591
ISSN: 1471-2105
Sponsor: MK acknowledges support from FMC Technologies Fund Fellowship. We acknowledge the Financial support from University of Illinois at Chicago and China National Basic Research Program 2007CB947800.
Date Available in INDIGO: 2011-05-27

This item appears in the following Collection(s)

Show simple item record


Country Code Views
United States of America 322
China 102
Russian Federation 25
Germany 10
United Kingdom 9


My Account


Access Key