Cargando…

Comparison of false-discovery rates of various decoy databases

BACKGROUND: The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Brui...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lee, Sangjeong, Park, Heejin, Kim, Hyunwoo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8449453/ https://www.ncbi.nlm.nih.gov/pubmed/34537052 http://dx.doi.org/10.1186/s12953-021-00179-7

_version_	1784569422662336512
author	Lee, Sangjeong Park, Heejin Kim, Hyunwoo
author_facet	Lee, Sangjeong Park, Heejin Kim, Hyunwoo
author_sort	Lee, Sangjeong
collection	PubMed
description	BACKGROUND: The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ. RESULTS: We used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database. CONCLUSION: The FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12953-021-00179-7.
format	Online Article Text
id	pubmed-8449453
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-84494532021-09-20 Comparison of false-discovery rates of various decoy databases Lee, Sangjeong Park, Heejin Kim, Hyunwoo Proteome Sci Research BACKGROUND: The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ. RESULTS: We used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database. CONCLUSION: The FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12953-021-00179-7. BioMed Central 2021-09-18 /pmc/articles/PMC8449453/ /pubmed/34537052 http://dx.doi.org/10.1186/s12953-021-00179-7 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Lee, Sangjeong Park, Heejin Kim, Hyunwoo Comparison of false-discovery rates of various decoy databases
title	Comparison of false-discovery rates of various decoy databases
title_full	Comparison of false-discovery rates of various decoy databases
title_fullStr	Comparison of false-discovery rates of various decoy databases
title_full_unstemmed	Comparison of false-discovery rates of various decoy databases
title_short	Comparison of false-discovery rates of various decoy databases
title_sort	comparison of false-discovery rates of various decoy databases
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8449453/ https://www.ncbi.nlm.nih.gov/pubmed/34537052 http://dx.doi.org/10.1186/s12953-021-00179-7
work_keys_str_mv	AT leesangjeong comparisonoffalsediscoveryratesofvariousdecoydatabases AT parkheejin comparisonoffalsediscoveryratesofvariousdecoydatabases AT kimhyunwoo comparisonoffalsediscoveryratesofvariousdecoydatabases

Comparison of false-discovery rates of various decoy databases

Ejemplares similares