Cargando…

MUMAL: Multivariate analysis in shotgun proteomics using machine learning techniques

BACKGROUND: The shotgun strategy (liquid chromatography coupled with tandem mass spectrometry) is widely applied for identification of proteins in complex mixtures. This method gives rise to thousands of spectra in a single run, which are interpreted by computational tools. Such tools normally use a...

Descripción completa

Detalles Bibliográficos
Autores principales: Cerqueira, Fabio R, Ferreira, Ricardo S, Oliveira, Alcione P, Gomes, Andreia P, Ramos, Humberto JO, Graber, Armin, Baumgartner, Christian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3477001/
https://www.ncbi.nlm.nih.gov/pubmed/23095859
http://dx.doi.org/10.1186/1471-2164-13-S5-S4
_version_ 1782247156952858624
author Cerqueira, Fabio R
Ferreira, Ricardo S
Oliveira, Alcione P
Gomes, Andreia P
Ramos, Humberto JO
Graber, Armin
Baumgartner, Christian
author_facet Cerqueira, Fabio R
Ferreira, Ricardo S
Oliveira, Alcione P
Gomes, Andreia P
Ramos, Humberto JO
Graber, Armin
Baumgartner, Christian
author_sort Cerqueira, Fabio R
collection PubMed
description BACKGROUND: The shotgun strategy (liquid chromatography coupled with tandem mass spectrometry) is widely applied for identification of proteins in complex mixtures. This method gives rise to thousands of spectra in a single run, which are interpreted by computational tools. Such tools normally use a protein database from which peptide sequences are extracted for matching with experimentally derived mass spectral data. After the database search, the correctness of obtained peptide-spectrum matches (PSMs) needs to be evaluated also by algorithms, as a manual curation of these huge datasets would be impractical. The target-decoy database strategy is largely used to perform spectrum evaluation. Nonetheless, this method has been applied without considering sensitivity, i.e., only error estimation is taken into account. A recently proposed method termed MUDE treats the target-decoy analysis as an optimization problem, where sensitivity is maximized. This method demonstrates a significant increase in the retrieved number of PSMs for a fixed error rate. However, the MUDE model is constructed in such a way that linear decision boundaries are established to separate correct from incorrect PSMs. Besides, the described heuristic for solving the optimization problem has to be executed many times to achieve a significant augmentation in sensitivity. RESULTS: Here, we propose a new method, termed MUMAL, for PSM assessment that is based on machine learning techniques. Our method can establish nonlinear decision boundaries, leading to a higher chance to retrieve more true positives. Furthermore, we need few iterations to achieve high sensitivities, strikingly shortening the running time of the whole process. Experiments show that our method achieves a considerably higher number of PSMs compared with standard tools such as MUDE, PeptideProphet, and typical target-decoy approaches. CONCLUSION: Our approach not only enhances the computational performance, and thus the turn around time of MS-based experiments in proteomics, but also improves the information content with benefits of a higher proteome coverage. This improvement, for instance, increases the chance to identify important drug targets or biomarkers for drug development or molecular diagnostics.
format Online
Article
Text
id pubmed-3477001
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34770012012-10-23 MUMAL: Multivariate analysis in shotgun proteomics using machine learning techniques Cerqueira, Fabio R Ferreira, Ricardo S Oliveira, Alcione P Gomes, Andreia P Ramos, Humberto JO Graber, Armin Baumgartner, Christian BMC Genomics Research BACKGROUND: The shotgun strategy (liquid chromatography coupled with tandem mass spectrometry) is widely applied for identification of proteins in complex mixtures. This method gives rise to thousands of spectra in a single run, which are interpreted by computational tools. Such tools normally use a protein database from which peptide sequences are extracted for matching with experimentally derived mass spectral data. After the database search, the correctness of obtained peptide-spectrum matches (PSMs) needs to be evaluated also by algorithms, as a manual curation of these huge datasets would be impractical. The target-decoy database strategy is largely used to perform spectrum evaluation. Nonetheless, this method has been applied without considering sensitivity, i.e., only error estimation is taken into account. A recently proposed method termed MUDE treats the target-decoy analysis as an optimization problem, where sensitivity is maximized. This method demonstrates a significant increase in the retrieved number of PSMs for a fixed error rate. However, the MUDE model is constructed in such a way that linear decision boundaries are established to separate correct from incorrect PSMs. Besides, the described heuristic for solving the optimization problem has to be executed many times to achieve a significant augmentation in sensitivity. RESULTS: Here, we propose a new method, termed MUMAL, for PSM assessment that is based on machine learning techniques. Our method can establish nonlinear decision boundaries, leading to a higher chance to retrieve more true positives. Furthermore, we need few iterations to achieve high sensitivities, strikingly shortening the running time of the whole process. Experiments show that our method achieves a considerably higher number of PSMs compared with standard tools such as MUDE, PeptideProphet, and typical target-decoy approaches. CONCLUSION: Our approach not only enhances the computational performance, and thus the turn around time of MS-based experiments in proteomics, but also improves the information content with benefits of a higher proteome coverage. This improvement, for instance, increases the chance to identify important drug targets or biomarkers for drug development or molecular diagnostics. BioMed Central 2012-10-19 /pmc/articles/PMC3477001/ /pubmed/23095859 http://dx.doi.org/10.1186/1471-2164-13-S5-S4 Text en Copyright ©2012 Cerqueira et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Cerqueira, Fabio R
Ferreira, Ricardo S
Oliveira, Alcione P
Gomes, Andreia P
Ramos, Humberto JO
Graber, Armin
Baumgartner, Christian
MUMAL: Multivariate analysis in shotgun proteomics using machine learning techniques
title MUMAL: Multivariate analysis in shotgun proteomics using machine learning techniques
title_full MUMAL: Multivariate analysis in shotgun proteomics using machine learning techniques
title_fullStr MUMAL: Multivariate analysis in shotgun proteomics using machine learning techniques
title_full_unstemmed MUMAL: Multivariate analysis in shotgun proteomics using machine learning techniques
title_short MUMAL: Multivariate analysis in shotgun proteomics using machine learning techniques
title_sort mumal: multivariate analysis in shotgun proteomics using machine learning techniques
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3477001/
https://www.ncbi.nlm.nih.gov/pubmed/23095859
http://dx.doi.org/10.1186/1471-2164-13-S5-S4
work_keys_str_mv AT cerqueirafabior mumalmultivariateanalysisinshotgunproteomicsusingmachinelearningtechniques
AT ferreiraricardos mumalmultivariateanalysisinshotgunproteomicsusingmachinelearningtechniques
AT oliveiraalcionep mumalmultivariateanalysisinshotgunproteomicsusingmachinelearningtechniques
AT gomesandreiap mumalmultivariateanalysisinshotgunproteomicsusingmachinelearningtechniques
AT ramoshumbertojo mumalmultivariateanalysisinshotgunproteomicsusingmachinelearningtechniques
AT graberarmin mumalmultivariateanalysisinshotgunproteomicsusingmachinelearningtechniques
AT baumgartnerchristian mumalmultivariateanalysisinshotgunproteomicsusingmachinelearningtechniques