Cargando…

False discovery rate estimation using candidate peptides for each spectrum

BACKGROUND: False discovery rate (FDR) estimation is very important in proteomics. The target-decoy strategy (TDS), which is often used for FDR estimation, estimates the FDR under the assumption that when spectra are identified incorrectly, the probabilities of the spectra matching the target or dec...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Sangjeong, Park, Heejin, Kim, Hyunwoo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9623924/
https://www.ncbi.nlm.nih.gov/pubmed/36319948
http://dx.doi.org/10.1186/s12859-022-05002-4
_version_ 1784822115046785024
author Lee, Sangjeong
Park, Heejin
Kim, Hyunwoo
author_facet Lee, Sangjeong
Park, Heejin
Kim, Hyunwoo
author_sort Lee, Sangjeong
collection PubMed
description BACKGROUND: False discovery rate (FDR) estimation is very important in proteomics. The target-decoy strategy (TDS), which is often used for FDR estimation, estimates the FDR under the assumption that when spectra are identified incorrectly, the probabilities of the spectra matching the target or decoy peptides are identical. However, no spectra matching target or decoy peptide probabilities are identical. We propose cTDS (target-decoy strategy with candidate peptides) for accurate estimation of the FDR using the probability that the spectrum is identified incorrectly as a target or decoy peptide. RESULTS: Most spectrum cases result in a probability of having the spectrum identified incorrectly as a target or decoy peptide of close to 0.5, but only about 1.14–4.85% of the total spectra have an exact probability of 0.5. We used an entrapment sequence method to demonstrate the accuracy of cTDS. For fixed FDR thresholds (1–10%), the false match rate (FMR) in cTDS is closer than the FMR in TDS. We compared the number of peptide-spectrum matches (PSMs) obtained with TDS and cTDS at a 1% FDR threshold with the HEK293 dataset. In the first and third replications, the number of PSMs obtained with cTDS for the reverse, pseudo-reverse, shuffle, and de Bruijn databases exceeded those obtained with TDS (about 0.001–0.132%), with the pseudo-shuffle database containing less compared to TDS (about 0.05–0.126%). In the second replication, the number of PSMs obtained with cTDS for all databases exceeds that obtained with TDS (about 0.013–0.274%). CONCLUSIONS: When spectra are actually identified incorrectly, most probabilities of the spectra matching a target or decoy peptide are not identical. Therefore, we propose cTDS, which estimates the FDR more accurately using the probability of the spectrum being identified incorrectly as a target or decoy peptide. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-05002-4.
format Online
Article
Text
id pubmed-9623924
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-96239242022-11-02 False discovery rate estimation using candidate peptides for each spectrum Lee, Sangjeong Park, Heejin Kim, Hyunwoo BMC Bioinformatics Research BACKGROUND: False discovery rate (FDR) estimation is very important in proteomics. The target-decoy strategy (TDS), which is often used for FDR estimation, estimates the FDR under the assumption that when spectra are identified incorrectly, the probabilities of the spectra matching the target or decoy peptides are identical. However, no spectra matching target or decoy peptide probabilities are identical. We propose cTDS (target-decoy strategy with candidate peptides) for accurate estimation of the FDR using the probability that the spectrum is identified incorrectly as a target or decoy peptide. RESULTS: Most spectrum cases result in a probability of having the spectrum identified incorrectly as a target or decoy peptide of close to 0.5, but only about 1.14–4.85% of the total spectra have an exact probability of 0.5. We used an entrapment sequence method to demonstrate the accuracy of cTDS. For fixed FDR thresholds (1–10%), the false match rate (FMR) in cTDS is closer than the FMR in TDS. We compared the number of peptide-spectrum matches (PSMs) obtained with TDS and cTDS at a 1% FDR threshold with the HEK293 dataset. In the first and third replications, the number of PSMs obtained with cTDS for the reverse, pseudo-reverse, shuffle, and de Bruijn databases exceeded those obtained with TDS (about 0.001–0.132%), with the pseudo-shuffle database containing less compared to TDS (about 0.05–0.126%). In the second replication, the number of PSMs obtained with cTDS for all databases exceeds that obtained with TDS (about 0.013–0.274%). CONCLUSIONS: When spectra are actually identified incorrectly, most probabilities of the spectra matching a target or decoy peptide are not identical. Therefore, we propose cTDS, which estimates the FDR more accurately using the probability of the spectrum being identified incorrectly as a target or decoy peptide. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-05002-4. BioMed Central 2022-11-01 /pmc/articles/PMC9623924/ /pubmed/36319948 http://dx.doi.org/10.1186/s12859-022-05002-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Lee, Sangjeong
Park, Heejin
Kim, Hyunwoo
False discovery rate estimation using candidate peptides for each spectrum
title False discovery rate estimation using candidate peptides for each spectrum
title_full False discovery rate estimation using candidate peptides for each spectrum
title_fullStr False discovery rate estimation using candidate peptides for each spectrum
title_full_unstemmed False discovery rate estimation using candidate peptides for each spectrum
title_short False discovery rate estimation using candidate peptides for each spectrum
title_sort false discovery rate estimation using candidate peptides for each spectrum
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9623924/
https://www.ncbi.nlm.nih.gov/pubmed/36319948
http://dx.doi.org/10.1186/s12859-022-05002-4
work_keys_str_mv AT leesangjeong falsediscoveryrateestimationusingcandidatepeptidesforeachspectrum
AT parkheejin falsediscoveryrateestimationusingcandidatepeptidesforeachspectrum
AT kimhyunwoo falsediscoveryrateestimationusingcandidatepeptidesforeachspectrum