Cargando…

An improved machine learning protocol for the identification of correct Sequest search results

BACKGROUND: Mass spectrometry has become a standard method by which the proteomic profile of cell or tissue samples is characterized. To fully take advantage of tandem mass spectrometry (MS/MS) techniques in large scale protein characterization studies robust and consistent data analysis procedures...

Descripción completa

Detalles Bibliográficos
Autores principales: Källberg, Morten, Lu, Hui
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013103/
https://www.ncbi.nlm.nih.gov/pubmed/21138573
http://dx.doi.org/10.1186/1471-2105-11-591
_version_ 1782195227637841920
author Källberg, Morten
Lu, Hui
author_facet Källberg, Morten
Lu, Hui
author_sort Källberg, Morten
collection PubMed
description BACKGROUND: Mass spectrometry has become a standard method by which the proteomic profile of cell or tissue samples is characterized. To fully take advantage of tandem mass spectrometry (MS/MS) techniques in large scale protein characterization studies robust and consistent data analysis procedures are crucial. In this work we present a machine learning based protocol for the identification of correct peptide-spectrum matches from Sequest database search results, improving on previously published protocols. RESULTS: The developed model improves on published machine learning classification procedures by 6% as measured by the area under the ROC curve. Further, we show how the developed model can be presented as an interpretable tree of additive rules, thereby effectively removing the 'black-box' notion often associated with machine learning classifiers, allowing for comparison with expert rule-of-thumb. Finally, a method for extending the developed peptide identification protocol to give probabilistic estimates of the presence of a given protein is proposed and tested. CONCLUSIONS: We demonstrate the construction of a high accuracy classification model for Sequest search results from MS/MS spectra obtained by using the MALDI ionization. The developed model performs well in identifying correct peptide-spectrum matches and is easily extendable to the protein identification problem. The relative ease with which additional experimental parameters can be incorporated into the classification framework, to give additional discriminatory power, allows for future tailoring of the model to take advantage of information from specific instrument set-ups.
format Text
id pubmed-3013103
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30131032011-01-04 An improved machine learning protocol for the identification of correct Sequest search results Källberg, Morten Lu, Hui BMC Bioinformatics Methodology Article BACKGROUND: Mass spectrometry has become a standard method by which the proteomic profile of cell or tissue samples is characterized. To fully take advantage of tandem mass spectrometry (MS/MS) techniques in large scale protein characterization studies robust and consistent data analysis procedures are crucial. In this work we present a machine learning based protocol for the identification of correct peptide-spectrum matches from Sequest database search results, improving on previously published protocols. RESULTS: The developed model improves on published machine learning classification procedures by 6% as measured by the area under the ROC curve. Further, we show how the developed model can be presented as an interpretable tree of additive rules, thereby effectively removing the 'black-box' notion often associated with machine learning classifiers, allowing for comparison with expert rule-of-thumb. Finally, a method for extending the developed peptide identification protocol to give probabilistic estimates of the presence of a given protein is proposed and tested. CONCLUSIONS: We demonstrate the construction of a high accuracy classification model for Sequest search results from MS/MS spectra obtained by using the MALDI ionization. The developed model performs well in identifying correct peptide-spectrum matches and is easily extendable to the protein identification problem. The relative ease with which additional experimental parameters can be incorporated into the classification framework, to give additional discriminatory power, allows for future tailoring of the model to take advantage of information from specific instrument set-ups. BioMed Central 2010-12-07 /pmc/articles/PMC3013103/ /pubmed/21138573 http://dx.doi.org/10.1186/1471-2105-11-591 Text en Copyright ©2010 Källberg and Lu; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Källberg, Morten
Lu, Hui
An improved machine learning protocol for the identification of correct Sequest search results
title An improved machine learning protocol for the identification of correct Sequest search results
title_full An improved machine learning protocol for the identification of correct Sequest search results
title_fullStr An improved machine learning protocol for the identification of correct Sequest search results
title_full_unstemmed An improved machine learning protocol for the identification of correct Sequest search results
title_short An improved machine learning protocol for the identification of correct Sequest search results
title_sort improved machine learning protocol for the identification of correct sequest search results
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013103/
https://www.ncbi.nlm.nih.gov/pubmed/21138573
http://dx.doi.org/10.1186/1471-2105-11-591
work_keys_str_mv AT kallbergmorten animprovedmachinelearningprotocolfortheidentificationofcorrectsequestsearchresults
AT luhui animprovedmachinelearningprotocolfortheidentificationofcorrectsequestsearchresults
AT kallbergmorten improvedmachinelearningprotocolfortheidentificationofcorrectsequestsearchresults
AT luhui improvedmachinelearningprotocolfortheidentificationofcorrectsequestsearchresults