Cargando…

Transfer posterior error probability estimation for peptide identification

BACKGROUND: In shotgun proteomics, database searching of tandem mass spectra results in a great number of peptide-spectrum matches (PSMs), many of which are false positives. Quality control of PSMs is a multiple hypothesis testing problem, and the false discovery rate (FDR) or the posterior error pr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yi, Xinpei, Gong, Fuzhou, Fu, Yan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7199311/ https://www.ncbi.nlm.nih.gov/pubmed/32366221 http://dx.doi.org/10.1186/s12859-020-3485-y

_version_	1783529132354699264
author	Yi, Xinpei Gong, Fuzhou Fu, Yan
author_facet	Yi, Xinpei Gong, Fuzhou Fu, Yan
author_sort	Yi, Xinpei
collection	PubMed
description	BACKGROUND: In shotgun proteomics, database searching of tandem mass spectra results in a great number of peptide-spectrum matches (PSMs), many of which are false positives. Quality control of PSMs is a multiple hypothesis testing problem, and the false discovery rate (FDR) or the posterior error probability (PEP) is the commonly used statistical confidence measure. PEP, also called local FDR, can evaluate the confidence of individual PSMs and thus is more desirable than FDR, which evaluates the global confidence of a collection of PSMs. Estimation of PEP can be achieved by decomposing the null and alternative distributions of PSM scores as long as the given data is sufficient. However, in many proteomic studies, only a group (subset) of PSMs, e.g. those with specific post-translational modifications, are of interest. The group can be very small, making the direct PEP estimation by the group data inaccurate, especially for the high-score area where the score threshold is taken. Using the whole set of PSMs to estimate the group PEP is inappropriate either, because the null and/or alternative distributions of the group can be very different from those of combined scores. RESULTS: The transfer PEP algorithm is proposed to more accurately estimate the PEPs of peptide identifications in small groups. Transfer PEP derives the group null distribution through its empirical relationship with the combined null distribution, and estimates the group alternative distribution, as well as the null proportion, using an iterative semi-parametric method. Validated on both simulated data and real proteomic data, transfer PEP showed remarkably higher accuracy than the direct combined and separate PEP estimation methods. CONCLUSIONS: We presented a novel approach to group PEP estimation for small groups and implemented it for the peptide identification problem in proteomics. The methodology of the approach is in principle applicable to the small-group PEP estimation problems in other fields.
format	Online Article Text
id	pubmed-7199311
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-71993112020-05-08 Transfer posterior error probability estimation for peptide identification Yi, Xinpei Gong, Fuzhou Fu, Yan BMC Bioinformatics Methodology Article BACKGROUND: In shotgun proteomics, database searching of tandem mass spectra results in a great number of peptide-spectrum matches (PSMs), many of which are false positives. Quality control of PSMs is a multiple hypothesis testing problem, and the false discovery rate (FDR) or the posterior error probability (PEP) is the commonly used statistical confidence measure. PEP, also called local FDR, can evaluate the confidence of individual PSMs and thus is more desirable than FDR, which evaluates the global confidence of a collection of PSMs. Estimation of PEP can be achieved by decomposing the null and alternative distributions of PSM scores as long as the given data is sufficient. However, in many proteomic studies, only a group (subset) of PSMs, e.g. those with specific post-translational modifications, are of interest. The group can be very small, making the direct PEP estimation by the group data inaccurate, especially for the high-score area where the score threshold is taken. Using the whole set of PSMs to estimate the group PEP is inappropriate either, because the null and/or alternative distributions of the group can be very different from those of combined scores. RESULTS: The transfer PEP algorithm is proposed to more accurately estimate the PEPs of peptide identifications in small groups. Transfer PEP derives the group null distribution through its empirical relationship with the combined null distribution, and estimates the group alternative distribution, as well as the null proportion, using an iterative semi-parametric method. Validated on both simulated data and real proteomic data, transfer PEP showed remarkably higher accuracy than the direct combined and separate PEP estimation methods. CONCLUSIONS: We presented a novel approach to group PEP estimation for small groups and implemented it for the peptide identification problem in proteomics. The methodology of the approach is in principle applicable to the small-group PEP estimation problems in other fields. BioMed Central 2020-05-04 /pmc/articles/PMC7199311/ /pubmed/32366221 http://dx.doi.org/10.1186/s12859-020-3485-y Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Article Yi, Xinpei Gong, Fuzhou Fu, Yan Transfer posterior error probability estimation for peptide identification
title	Transfer posterior error probability estimation for peptide identification
title_full	Transfer posterior error probability estimation for peptide identification
title_fullStr	Transfer posterior error probability estimation for peptide identification
title_full_unstemmed	Transfer posterior error probability estimation for peptide identification
title_short	Transfer posterior error probability estimation for peptide identification
title_sort	transfer posterior error probability estimation for peptide identification
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7199311/ https://www.ncbi.nlm.nih.gov/pubmed/32366221 http://dx.doi.org/10.1186/s12859-020-3485-y
work_keys_str_mv	AT yixinpei transferposteriorerrorprobabilityestimationforpeptideidentification AT gongfuzhou transferposteriorerrorprobabilityestimationforpeptideidentification AT fuyan transferposteriorerrorprobabilityestimationforpeptideidentification

Transfer posterior error probability estimation for peptide identification

Ejemplares similares