Cargando…

A cross-validation scheme for machine learning algorithms in shotgun proteomics

Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed p...

Descripción completa

Detalles Bibliográficos
Autores principales: Granholm, Viktor, Noble, William Stafford, Käll, Lukas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3489528/
https://www.ncbi.nlm.nih.gov/pubmed/23176259
http://dx.doi.org/10.1186/1471-2105-13-S16-S3
_version_ 1782248733710221312
author Granholm, Viktor
Noble, William Stafford
Käll, Lukas
author_facet Granholm, Viktor
Noble, William Stafford
Käll, Lukas
author_sort Granholm, Viktor
collection PubMed
description Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting.
format Online
Article
Text
id pubmed-3489528
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34895282012-11-08 A cross-validation scheme for machine learning algorithms in shotgun proteomics Granholm, Viktor Noble, William Stafford Käll, Lukas BMC Bioinformatics Review Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting. BioMed Central 2012-11-05 /pmc/articles/PMC3489528/ /pubmed/23176259 http://dx.doi.org/10.1186/1471-2105-13-S16-S3 Text en Copyright ©2012 Granholm et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Review
Granholm, Viktor
Noble, William Stafford
Käll, Lukas
A cross-validation scheme for machine learning algorithms in shotgun proteomics
title A cross-validation scheme for machine learning algorithms in shotgun proteomics
title_full A cross-validation scheme for machine learning algorithms in shotgun proteomics
title_fullStr A cross-validation scheme for machine learning algorithms in shotgun proteomics
title_full_unstemmed A cross-validation scheme for machine learning algorithms in shotgun proteomics
title_short A cross-validation scheme for machine learning algorithms in shotgun proteomics
title_sort cross-validation scheme for machine learning algorithms in shotgun proteomics
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3489528/
https://www.ncbi.nlm.nih.gov/pubmed/23176259
http://dx.doi.org/10.1186/1471-2105-13-S16-S3
work_keys_str_mv AT granholmviktor acrossvalidationschemeformachinelearningalgorithmsinshotgunproteomics
AT noblewilliamstafford acrossvalidationschemeformachinelearningalgorithmsinshotgunproteomics
AT kalllukas acrossvalidationschemeformachinelearningalgorithmsinshotgunproteomics
AT granholmviktor crossvalidationschemeformachinelearningalgorithmsinshotgunproteomics
AT noblewilliamstafford crossvalidationschemeformachinelearningalgorithmsinshotgunproteomics
AT kalllukas crossvalidationschemeformachinelearningalgorithmsinshotgunproteomics