Cargando…

Comparison of false-discovery rates of various decoy databases

BACKGROUND: The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Brui...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Sangjeong, Park, Heejin, Kim, Hyunwoo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8449453/
https://www.ncbi.nlm.nih.gov/pubmed/34537052
http://dx.doi.org/10.1186/s12953-021-00179-7
_version_ 1784569422662336512
author Lee, Sangjeong
Park, Heejin
Kim, Hyunwoo
author_facet Lee, Sangjeong
Park, Heejin
Kim, Hyunwoo
author_sort Lee, Sangjeong
collection PubMed
description BACKGROUND: The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ. RESULTS: We used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database. CONCLUSION: The FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12953-021-00179-7.
format Online
Article
Text
id pubmed-8449453
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-84494532021-09-20 Comparison of false-discovery rates of various decoy databases Lee, Sangjeong Park, Heejin Kim, Hyunwoo Proteome Sci Research BACKGROUND: The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ. RESULTS: We used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database. CONCLUSION: The FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12953-021-00179-7. BioMed Central 2021-09-18 /pmc/articles/PMC8449453/ /pubmed/34537052 http://dx.doi.org/10.1186/s12953-021-00179-7 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Lee, Sangjeong
Park, Heejin
Kim, Hyunwoo
Comparison of false-discovery rates of various decoy databases
title Comparison of false-discovery rates of various decoy databases
title_full Comparison of false-discovery rates of various decoy databases
title_fullStr Comparison of false-discovery rates of various decoy databases
title_full_unstemmed Comparison of false-discovery rates of various decoy databases
title_short Comparison of false-discovery rates of various decoy databases
title_sort comparison of false-discovery rates of various decoy databases
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8449453/
https://www.ncbi.nlm.nih.gov/pubmed/34537052
http://dx.doi.org/10.1186/s12953-021-00179-7
work_keys_str_mv AT leesangjeong comparisonoffalsediscoveryratesofvariousdecoydatabases
AT parkheejin comparisonoffalsediscoveryratesofvariousdecoydatabases
AT kimhyunwoo comparisonoffalsediscoveryratesofvariousdecoydatabases