Cargando…

Deep-Learning-Derived Evaluation Metrics Enable Effective Benchmarking of Computational Tools for Phosphopeptide Identification

Tandem mass spectrometry (MS/MS)-based phosphoproteomics is a powerful technology for global phosphorylation analysis. However, applying four computational pipelines to a typical mass spectrometry (MS)-based phosphoproteomic dataset from a human cancer study, we observed a large discrepancy among th...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Wen, Wen, Bo, Li, Kai, Zeng, Wen-Feng, da Veiga Leprevost, Felipe, Moon, Jamie, Petyuk, Vladislav A., Edwards, Nathan J., Liu, Tao, Nesvizhskii, Alexey I., Zhang, Bing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Biochemistry and Molecular Biology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8609164/
https://www.ncbi.nlm.nih.gov/pubmed/34737085
http://dx.doi.org/10.1016/j.mcpro.2021.100171
_version_ 1784602872582766592
author Jiang, Wen
Wen, Bo
Li, Kai
Zeng, Wen-Feng
da Veiga Leprevost, Felipe
Moon, Jamie
Petyuk, Vladislav A.
Edwards, Nathan J.
Liu, Tao
Nesvizhskii, Alexey I.
Zhang, Bing
author_facet Jiang, Wen
Wen, Bo
Li, Kai
Zeng, Wen-Feng
da Veiga Leprevost, Felipe
Moon, Jamie
Petyuk, Vladislav A.
Edwards, Nathan J.
Liu, Tao
Nesvizhskii, Alexey I.
Zhang, Bing
author_sort Jiang, Wen
collection PubMed
description Tandem mass spectrometry (MS/MS)-based phosphoproteomics is a powerful technology for global phosphorylation analysis. However, applying four computational pipelines to a typical mass spectrometry (MS)-based phosphoproteomic dataset from a human cancer study, we observed a large discrepancy among the reported phosphopeptide identification and phosphosite localization results, underscoring a critical need for benchmarking. While efforts have been made to compare performance of computational pipelines using data from synthetic phosphopeptides, evaluations involving real application data have been largely limited to comparing the numbers of phosphopeptide identifications due to the lack of appropriate evaluation metrics. We investigated three deep-learning-derived features as potential evaluation metrics: phosphosite probability, Delta RT, and spectral similarity. Predicted phosphosite probability is computed by MusiteDeep, which provides high accuracy as previously reported; Delta RT is defined as the absolute retention time (RT) difference between RTs observed and predicted by AutoRT; and spectral similarity is defined as the Pearson’s correlation coefficient between spectra observed and predicted by pDeep2. Using a synthetic peptide dataset, we found that both Delta RT and spectral similarity provided excellent discrimination between correct and incorrect peptide-spectrum matches (PSMs) both when incorrect PSMs involved wrong peptide sequences and even when incorrect PSMs were caused by only incorrect phosphosite localization. Based on these results, we used all the three deep-learning-derived features as evaluation metrics to compare different computational pipelines on diverse set of phosphoproteomic datasets and showed their utility in benchmarking performance of the pipelines. The benchmark metrics demonstrated in this study will enable users to select computational pipelines and parameters for routine analysis of phosphoproteomics data and will offer guidance for developers to improve computational methods.
format Online
Article
Text
id pubmed-8609164
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Society for Biochemistry and Molecular Biology
record_format MEDLINE/PubMed
spelling pubmed-86091642021-11-29 Deep-Learning-Derived Evaluation Metrics Enable Effective Benchmarking of Computational Tools for Phosphopeptide Identification Jiang, Wen Wen, Bo Li, Kai Zeng, Wen-Feng da Veiga Leprevost, Felipe Moon, Jamie Petyuk, Vladislav A. Edwards, Nathan J. Liu, Tao Nesvizhskii, Alexey I. Zhang, Bing Mol Cell Proteomics Research Tandem mass spectrometry (MS/MS)-based phosphoproteomics is a powerful technology for global phosphorylation analysis. However, applying four computational pipelines to a typical mass spectrometry (MS)-based phosphoproteomic dataset from a human cancer study, we observed a large discrepancy among the reported phosphopeptide identification and phosphosite localization results, underscoring a critical need for benchmarking. While efforts have been made to compare performance of computational pipelines using data from synthetic phosphopeptides, evaluations involving real application data have been largely limited to comparing the numbers of phosphopeptide identifications due to the lack of appropriate evaluation metrics. We investigated three deep-learning-derived features as potential evaluation metrics: phosphosite probability, Delta RT, and spectral similarity. Predicted phosphosite probability is computed by MusiteDeep, which provides high accuracy as previously reported; Delta RT is defined as the absolute retention time (RT) difference between RTs observed and predicted by AutoRT; and spectral similarity is defined as the Pearson’s correlation coefficient between spectra observed and predicted by pDeep2. Using a synthetic peptide dataset, we found that both Delta RT and spectral similarity provided excellent discrimination between correct and incorrect peptide-spectrum matches (PSMs) both when incorrect PSMs involved wrong peptide sequences and even when incorrect PSMs were caused by only incorrect phosphosite localization. Based on these results, we used all the three deep-learning-derived features as evaluation metrics to compare different computational pipelines on diverse set of phosphoproteomic datasets and showed their utility in benchmarking performance of the pipelines. The benchmark metrics demonstrated in this study will enable users to select computational pipelines and parameters for routine analysis of phosphoproteomics data and will offer guidance for developers to improve computational methods. American Society for Biochemistry and Molecular Biology 2021-11-01 /pmc/articles/PMC8609164/ /pubmed/34737085 http://dx.doi.org/10.1016/j.mcpro.2021.100171 Text en © 2021 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research
Jiang, Wen
Wen, Bo
Li, Kai
Zeng, Wen-Feng
da Veiga Leprevost, Felipe
Moon, Jamie
Petyuk, Vladislav A.
Edwards, Nathan J.
Liu, Tao
Nesvizhskii, Alexey I.
Zhang, Bing
Deep-Learning-Derived Evaluation Metrics Enable Effective Benchmarking of Computational Tools for Phosphopeptide Identification
title Deep-Learning-Derived Evaluation Metrics Enable Effective Benchmarking of Computational Tools for Phosphopeptide Identification
title_full Deep-Learning-Derived Evaluation Metrics Enable Effective Benchmarking of Computational Tools for Phosphopeptide Identification
title_fullStr Deep-Learning-Derived Evaluation Metrics Enable Effective Benchmarking of Computational Tools for Phosphopeptide Identification
title_full_unstemmed Deep-Learning-Derived Evaluation Metrics Enable Effective Benchmarking of Computational Tools for Phosphopeptide Identification
title_short Deep-Learning-Derived Evaluation Metrics Enable Effective Benchmarking of Computational Tools for Phosphopeptide Identification
title_sort deep-learning-derived evaluation metrics enable effective benchmarking of computational tools for phosphopeptide identification
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8609164/
https://www.ncbi.nlm.nih.gov/pubmed/34737085
http://dx.doi.org/10.1016/j.mcpro.2021.100171
work_keys_str_mv AT jiangwen deeplearningderivedevaluationmetricsenableeffectivebenchmarkingofcomputationaltoolsforphosphopeptideidentification
AT wenbo deeplearningderivedevaluationmetricsenableeffectivebenchmarkingofcomputationaltoolsforphosphopeptideidentification
AT likai deeplearningderivedevaluationmetricsenableeffectivebenchmarkingofcomputationaltoolsforphosphopeptideidentification
AT zengwenfeng deeplearningderivedevaluationmetricsenableeffectivebenchmarkingofcomputationaltoolsforphosphopeptideidentification
AT daveigaleprevostfelipe deeplearningderivedevaluationmetricsenableeffectivebenchmarkingofcomputationaltoolsforphosphopeptideidentification
AT moonjamie deeplearningderivedevaluationmetricsenableeffectivebenchmarkingofcomputationaltoolsforphosphopeptideidentification
AT petyukvladislava deeplearningderivedevaluationmetricsenableeffectivebenchmarkingofcomputationaltoolsforphosphopeptideidentification
AT edwardsnathanj deeplearningderivedevaluationmetricsenableeffectivebenchmarkingofcomputationaltoolsforphosphopeptideidentification
AT liutao deeplearningderivedevaluationmetricsenableeffectivebenchmarkingofcomputationaltoolsforphosphopeptideidentification
AT nesvizhskiialexeyi deeplearningderivedevaluationmetricsenableeffectivebenchmarkingofcomputationaltoolsforphosphopeptideidentification
AT zhangbing deeplearningderivedevaluationmetricsenableeffectivebenchmarkingofcomputationaltoolsforphosphopeptideidentification