Cargando…

Improved machine learning method for analysis of gas phase chemistry of peptides

BACKGROUND: Accurate peptide identification is important to high-throughput proteomics analyses that use mass spectrometry. Search programs compare fragmentation spectra (MS/MS) of peptides from complex digests with theoretically derived spectra from a database of protein sequences. Improved discrim...

Descripción completa

Detalles Bibliográficos
Autores principales: Gehrke, Allison, Sun, Shaojun, Kurgan, Lukasz, Ahn, Natalie, Resing, Katheryn, Kafadar, Karen, Cios, Krzysztof
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2612015/
https://www.ncbi.nlm.nih.gov/pubmed/19055745
http://dx.doi.org/10.1186/1471-2105-9-515
_version_ 1782163114552197120
author Gehrke, Allison
Sun, Shaojun
Kurgan, Lukasz
Ahn, Natalie
Resing, Katheryn
Kafadar, Karen
Cios, Krzysztof
author_facet Gehrke, Allison
Sun, Shaojun
Kurgan, Lukasz
Ahn, Natalie
Resing, Katheryn
Kafadar, Karen
Cios, Krzysztof
author_sort Gehrke, Allison
collection PubMed
description BACKGROUND: Accurate peptide identification is important to high-throughput proteomics analyses that use mass spectrometry. Search programs compare fragmentation spectra (MS/MS) of peptides from complex digests with theoretically derived spectra from a database of protein sequences. Improved discrimination is achieved with theoretical spectra that are based on simulating gas phase chemistry of the peptides, but the limited understanding of those processes affects the accuracy of predictions from theoretical spectra. RESULTS: We employed a robust data mining strategy using new feature annotation functions of MAE software, which revealed under-prediction of the frequency of occurrence in fragmentation of the second peptide bond. We applied methods of exploratory data analysis to pre-process the information in the MS/MS spectra, including data normalization and attribute selection, to reduce the attributes to a smaller, less correlated set for machine learning studies. We then compared our rule building machine learning program, DataSqueezer, with commonly used association rules and decision tree algorithms. All used machine learning algorithms produced similar results that were consistent with expected properties for a second gas phase mechanism at the second peptide bond. CONCLUSION: The results provide compelling evidence that we have identified underlying chemical properties in the data that suggest the existence of an additional gas phase mechanism for the second peptide bond. Thus, the methods described in this study provide a valuable approach for analyses of this kind in the future.
format Text
id pubmed-2612015
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26120152008-12-30 Improved machine learning method for analysis of gas phase chemistry of peptides Gehrke, Allison Sun, Shaojun Kurgan, Lukasz Ahn, Natalie Resing, Katheryn Kafadar, Karen Cios, Krzysztof BMC Bioinformatics Research Article BACKGROUND: Accurate peptide identification is important to high-throughput proteomics analyses that use mass spectrometry. Search programs compare fragmentation spectra (MS/MS) of peptides from complex digests with theoretically derived spectra from a database of protein sequences. Improved discrimination is achieved with theoretical spectra that are based on simulating gas phase chemistry of the peptides, but the limited understanding of those processes affects the accuracy of predictions from theoretical spectra. RESULTS: We employed a robust data mining strategy using new feature annotation functions of MAE software, which revealed under-prediction of the frequency of occurrence in fragmentation of the second peptide bond. We applied methods of exploratory data analysis to pre-process the information in the MS/MS spectra, including data normalization and attribute selection, to reduce the attributes to a smaller, less correlated set for machine learning studies. We then compared our rule building machine learning program, DataSqueezer, with commonly used association rules and decision tree algorithms. All used machine learning algorithms produced similar results that were consistent with expected properties for a second gas phase mechanism at the second peptide bond. CONCLUSION: The results provide compelling evidence that we have identified underlying chemical properties in the data that suggest the existence of an additional gas phase mechanism for the second peptide bond. Thus, the methods described in this study provide a valuable approach for analyses of this kind in the future. BioMed Central 2008-12-03 /pmc/articles/PMC2612015/ /pubmed/19055745 http://dx.doi.org/10.1186/1471-2105-9-515 Text en Copyright © 2008 Gehrke et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Gehrke, Allison
Sun, Shaojun
Kurgan, Lukasz
Ahn, Natalie
Resing, Katheryn
Kafadar, Karen
Cios, Krzysztof
Improved machine learning method for analysis of gas phase chemistry of peptides
title Improved machine learning method for analysis of gas phase chemistry of peptides
title_full Improved machine learning method for analysis of gas phase chemistry of peptides
title_fullStr Improved machine learning method for analysis of gas phase chemistry of peptides
title_full_unstemmed Improved machine learning method for analysis of gas phase chemistry of peptides
title_short Improved machine learning method for analysis of gas phase chemistry of peptides
title_sort improved machine learning method for analysis of gas phase chemistry of peptides
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2612015/
https://www.ncbi.nlm.nih.gov/pubmed/19055745
http://dx.doi.org/10.1186/1471-2105-9-515
work_keys_str_mv AT gehrkeallison improvedmachinelearningmethodforanalysisofgasphasechemistryofpeptides
AT sunshaojun improvedmachinelearningmethodforanalysisofgasphasechemistryofpeptides
AT kurganlukasz improvedmachinelearningmethodforanalysisofgasphasechemistryofpeptides
AT ahnnatalie improvedmachinelearningmethodforanalysisofgasphasechemistryofpeptides
AT resingkatheryn improvedmachinelearningmethodforanalysisofgasphasechemistryofpeptides
AT kafadarkaren improvedmachinelearningmethodforanalysisofgasphasechemistryofpeptides
AT cioskrzysztof improvedmachinelearningmethodforanalysisofgasphasechemistryofpeptides