Cargando…
False discovery rate estimation using candidate peptides for each spectrum
BACKGROUND: False discovery rate (FDR) estimation is very important in proteomics. The target-decoy strategy (TDS), which is often used for FDR estimation, estimates the FDR under the assumption that when spectra are identified incorrectly, the probabilities of the spectra matching the target or dec...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9623924/ https://www.ncbi.nlm.nih.gov/pubmed/36319948 http://dx.doi.org/10.1186/s12859-022-05002-4 |
_version_ | 1784822115046785024 |
---|---|
author | Lee, Sangjeong Park, Heejin Kim, Hyunwoo |
author_facet | Lee, Sangjeong Park, Heejin Kim, Hyunwoo |
author_sort | Lee, Sangjeong |
collection | PubMed |
description | BACKGROUND: False discovery rate (FDR) estimation is very important in proteomics. The target-decoy strategy (TDS), which is often used for FDR estimation, estimates the FDR under the assumption that when spectra are identified incorrectly, the probabilities of the spectra matching the target or decoy peptides are identical. However, no spectra matching target or decoy peptide probabilities are identical. We propose cTDS (target-decoy strategy with candidate peptides) for accurate estimation of the FDR using the probability that the spectrum is identified incorrectly as a target or decoy peptide. RESULTS: Most spectrum cases result in a probability of having the spectrum identified incorrectly as a target or decoy peptide of close to 0.5, but only about 1.14–4.85% of the total spectra have an exact probability of 0.5. We used an entrapment sequence method to demonstrate the accuracy of cTDS. For fixed FDR thresholds (1–10%), the false match rate (FMR) in cTDS is closer than the FMR in TDS. We compared the number of peptide-spectrum matches (PSMs) obtained with TDS and cTDS at a 1% FDR threshold with the HEK293 dataset. In the first and third replications, the number of PSMs obtained with cTDS for the reverse, pseudo-reverse, shuffle, and de Bruijn databases exceeded those obtained with TDS (about 0.001–0.132%), with the pseudo-shuffle database containing less compared to TDS (about 0.05–0.126%). In the second replication, the number of PSMs obtained with cTDS for all databases exceeds that obtained with TDS (about 0.013–0.274%). CONCLUSIONS: When spectra are actually identified incorrectly, most probabilities of the spectra matching a target or decoy peptide are not identical. Therefore, we propose cTDS, which estimates the FDR more accurately using the probability of the spectrum being identified incorrectly as a target or decoy peptide. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-05002-4. |
format | Online Article Text |
id | pubmed-9623924 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-96239242022-11-02 False discovery rate estimation using candidate peptides for each spectrum Lee, Sangjeong Park, Heejin Kim, Hyunwoo BMC Bioinformatics Research BACKGROUND: False discovery rate (FDR) estimation is very important in proteomics. The target-decoy strategy (TDS), which is often used for FDR estimation, estimates the FDR under the assumption that when spectra are identified incorrectly, the probabilities of the spectra matching the target or decoy peptides are identical. However, no spectra matching target or decoy peptide probabilities are identical. We propose cTDS (target-decoy strategy with candidate peptides) for accurate estimation of the FDR using the probability that the spectrum is identified incorrectly as a target or decoy peptide. RESULTS: Most spectrum cases result in a probability of having the spectrum identified incorrectly as a target or decoy peptide of close to 0.5, but only about 1.14–4.85% of the total spectra have an exact probability of 0.5. We used an entrapment sequence method to demonstrate the accuracy of cTDS. For fixed FDR thresholds (1–10%), the false match rate (FMR) in cTDS is closer than the FMR in TDS. We compared the number of peptide-spectrum matches (PSMs) obtained with TDS and cTDS at a 1% FDR threshold with the HEK293 dataset. In the first and third replications, the number of PSMs obtained with cTDS for the reverse, pseudo-reverse, shuffle, and de Bruijn databases exceeded those obtained with TDS (about 0.001–0.132%), with the pseudo-shuffle database containing less compared to TDS (about 0.05–0.126%). In the second replication, the number of PSMs obtained with cTDS for all databases exceeds that obtained with TDS (about 0.013–0.274%). CONCLUSIONS: When spectra are actually identified incorrectly, most probabilities of the spectra matching a target or decoy peptide are not identical. Therefore, we propose cTDS, which estimates the FDR more accurately using the probability of the spectrum being identified incorrectly as a target or decoy peptide. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-05002-4. BioMed Central 2022-11-01 /pmc/articles/PMC9623924/ /pubmed/36319948 http://dx.doi.org/10.1186/s12859-022-05002-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Lee, Sangjeong Park, Heejin Kim, Hyunwoo False discovery rate estimation using candidate peptides for each spectrum |
title | False discovery rate estimation using candidate peptides for each spectrum |
title_full | False discovery rate estimation using candidate peptides for each spectrum |
title_fullStr | False discovery rate estimation using candidate peptides for each spectrum |
title_full_unstemmed | False discovery rate estimation using candidate peptides for each spectrum |
title_short | False discovery rate estimation using candidate peptides for each spectrum |
title_sort | false discovery rate estimation using candidate peptides for each spectrum |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9623924/ https://www.ncbi.nlm.nih.gov/pubmed/36319948 http://dx.doi.org/10.1186/s12859-022-05002-4 |
work_keys_str_mv | AT leesangjeong falsediscoveryrateestimationusingcandidatepeptidesforeachspectrum AT parkheejin falsediscoveryrateestimationusingcandidatepeptidesforeachspectrum AT kimhyunwoo falsediscoveryrateestimationusingcandidatepeptidesforeachspectrum |