Cargando…
Silver: Forging almost Gold Standard Datasets
Gene set analysis has been widely used to gain insight from high-throughput expression studies. Although various tools and methods have been developed for gene set analysis, there is no consensus among researchers regarding best practice(s). Most often, evaluation studies have reported contradictory...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8535810/ https://www.ncbi.nlm.nih.gov/pubmed/34680918 http://dx.doi.org/10.3390/genes12101523 |
_version_ | 1784587873632124928 |
---|---|
author | Maleki, Farhad Ovens, Katie McQuillan, Ian Kusalik, Anthony J. |
author_facet | Maleki, Farhad Ovens, Katie McQuillan, Ian Kusalik, Anthony J. |
author_sort | Maleki, Farhad |
collection | PubMed |
description | Gene set analysis has been widely used to gain insight from high-throughput expression studies. Although various tools and methods have been developed for gene set analysis, there is no consensus among researchers regarding best practice(s). Most often, evaluation studies have reported contradictory recommendations of which methods are superior. Therefore, an unbiased quantitative framework for evaluations of gene set analysis methods will be valuable. Such a framework requires gene expression datasets where enrichment status of gene sets is known a priori. In the absence of such gold standard datasets, artificial datasets are commonly used for evaluations of gene set analysis methods; however, they often rely on oversimplifying assumptions that make them biased in favor of or against a given method. In this paper, we propose a quantitative framework for evaluation of gene set analysis methods by synthesizing expression datasets using real data, without relying on oversimplifying or unrealistic assumptions, while preserving complex gene–gene correlations and retaining the distribution of expression values. The utility of the quantitative approach is shown by evaluating ten widely used gene set analysis methods. An implementation of the proposed method is publicly available. We suggest using Silver to evaluate existing and new gene set analysis methods. Evaluation using Silver provides a better understanding of current methods and can aid in the development of gene set analysis methods to achieve higher specificity without sacrificing sensitivity. |
format | Online Article Text |
id | pubmed-8535810 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-85358102021-10-23 Silver: Forging almost Gold Standard Datasets Maleki, Farhad Ovens, Katie McQuillan, Ian Kusalik, Anthony J. Genes (Basel) Article Gene set analysis has been widely used to gain insight from high-throughput expression studies. Although various tools and methods have been developed for gene set analysis, there is no consensus among researchers regarding best practice(s). Most often, evaluation studies have reported contradictory recommendations of which methods are superior. Therefore, an unbiased quantitative framework for evaluations of gene set analysis methods will be valuable. Such a framework requires gene expression datasets where enrichment status of gene sets is known a priori. In the absence of such gold standard datasets, artificial datasets are commonly used for evaluations of gene set analysis methods; however, they often rely on oversimplifying assumptions that make them biased in favor of or against a given method. In this paper, we propose a quantitative framework for evaluation of gene set analysis methods by synthesizing expression datasets using real data, without relying on oversimplifying or unrealistic assumptions, while preserving complex gene–gene correlations and retaining the distribution of expression values. The utility of the quantitative approach is shown by evaluating ten widely used gene set analysis methods. An implementation of the proposed method is publicly available. We suggest using Silver to evaluate existing and new gene set analysis methods. Evaluation using Silver provides a better understanding of current methods and can aid in the development of gene set analysis methods to achieve higher specificity without sacrificing sensitivity. MDPI 2021-09-28 /pmc/articles/PMC8535810/ /pubmed/34680918 http://dx.doi.org/10.3390/genes12101523 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Maleki, Farhad Ovens, Katie McQuillan, Ian Kusalik, Anthony J. Silver: Forging almost Gold Standard Datasets |
title | Silver: Forging almost Gold Standard Datasets |
title_full | Silver: Forging almost Gold Standard Datasets |
title_fullStr | Silver: Forging almost Gold Standard Datasets |
title_full_unstemmed | Silver: Forging almost Gold Standard Datasets |
title_short | Silver: Forging almost Gold Standard Datasets |
title_sort | silver: forging almost gold standard datasets |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8535810/ https://www.ncbi.nlm.nih.gov/pubmed/34680918 http://dx.doi.org/10.3390/genes12101523 |
work_keys_str_mv | AT malekifarhad silverforgingalmostgoldstandarddatasets AT ovenskatie silverforgingalmostgoldstandarddatasets AT mcquillanian silverforgingalmostgoldstandarddatasets AT kusalikanthonyj silverforgingalmostgoldstandarddatasets |