Cargando…
Comparison of false-discovery rates of various decoy databases
BACKGROUND: The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Brui...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8449453/ https://www.ncbi.nlm.nih.gov/pubmed/34537052 http://dx.doi.org/10.1186/s12953-021-00179-7 |
_version_ | 1784569422662336512 |
---|---|
author | Lee, Sangjeong Park, Heejin Kim, Hyunwoo |
author_facet | Lee, Sangjeong Park, Heejin Kim, Hyunwoo |
author_sort | Lee, Sangjeong |
collection | PubMed |
description | BACKGROUND: The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ. RESULTS: We used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database. CONCLUSION: The FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12953-021-00179-7. |
format | Online Article Text |
id | pubmed-8449453 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-84494532021-09-20 Comparison of false-discovery rates of various decoy databases Lee, Sangjeong Park, Heejin Kim, Hyunwoo Proteome Sci Research BACKGROUND: The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ. RESULTS: We used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database. CONCLUSION: The FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12953-021-00179-7. BioMed Central 2021-09-18 /pmc/articles/PMC8449453/ /pubmed/34537052 http://dx.doi.org/10.1186/s12953-021-00179-7 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Lee, Sangjeong Park, Heejin Kim, Hyunwoo Comparison of false-discovery rates of various decoy databases |
title | Comparison of false-discovery rates of various decoy databases |
title_full | Comparison of false-discovery rates of various decoy databases |
title_fullStr | Comparison of false-discovery rates of various decoy databases |
title_full_unstemmed | Comparison of false-discovery rates of various decoy databases |
title_short | Comparison of false-discovery rates of various decoy databases |
title_sort | comparison of false-discovery rates of various decoy databases |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8449453/ https://www.ncbi.nlm.nih.gov/pubmed/34537052 http://dx.doi.org/10.1186/s12953-021-00179-7 |
work_keys_str_mv | AT leesangjeong comparisonoffalsediscoveryratesofvariousdecoydatabases AT parkheejin comparisonoffalsediscoveryratesofvariousdecoydatabases AT kimhyunwoo comparisonoffalsediscoveryratesofvariousdecoydatabases |