Cargando…

Silver: Forging almost Gold Standard Datasets

Gene set analysis has been widely used to gain insight from high-throughput expression studies. Although various tools and methods have been developed for gene set analysis, there is no consensus among researchers regarding best practice(s). Most often, evaluation studies have reported contradictory...

Descripción completa

Detalles Bibliográficos
Autores principales: Maleki, Farhad, Ovens, Katie, McQuillan, Ian, Kusalik, Anthony J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8535810/
https://www.ncbi.nlm.nih.gov/pubmed/34680918
http://dx.doi.org/10.3390/genes12101523
_version_ 1784587873632124928
author Maleki, Farhad
Ovens, Katie
McQuillan, Ian
Kusalik, Anthony J.
author_facet Maleki, Farhad
Ovens, Katie
McQuillan, Ian
Kusalik, Anthony J.
author_sort Maleki, Farhad
collection PubMed
description Gene set analysis has been widely used to gain insight from high-throughput expression studies. Although various tools and methods have been developed for gene set analysis, there is no consensus among researchers regarding best practice(s). Most often, evaluation studies have reported contradictory recommendations of which methods are superior. Therefore, an unbiased quantitative framework for evaluations of gene set analysis methods will be valuable. Such a framework requires gene expression datasets where enrichment status of gene sets is known a priori. In the absence of such gold standard datasets, artificial datasets are commonly used for evaluations of gene set analysis methods; however, they often rely on oversimplifying assumptions that make them biased in favor of or against a given method. In this paper, we propose a quantitative framework for evaluation of gene set analysis methods by synthesizing expression datasets using real data, without relying on oversimplifying or unrealistic assumptions, while preserving complex gene–gene correlations and retaining the distribution of expression values. The utility of the quantitative approach is shown by evaluating ten widely used gene set analysis methods. An implementation of the proposed method is publicly available. We suggest using Silver to evaluate existing and new gene set analysis methods. Evaluation using Silver provides a better understanding of current methods and can aid in the development of gene set analysis methods to achieve higher specificity without sacrificing sensitivity.
format Online
Article
Text
id pubmed-8535810
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-85358102021-10-23 Silver: Forging almost Gold Standard Datasets Maleki, Farhad Ovens, Katie McQuillan, Ian Kusalik, Anthony J. Genes (Basel) Article Gene set analysis has been widely used to gain insight from high-throughput expression studies. Although various tools and methods have been developed for gene set analysis, there is no consensus among researchers regarding best practice(s). Most often, evaluation studies have reported contradictory recommendations of which methods are superior. Therefore, an unbiased quantitative framework for evaluations of gene set analysis methods will be valuable. Such a framework requires gene expression datasets where enrichment status of gene sets is known a priori. In the absence of such gold standard datasets, artificial datasets are commonly used for evaluations of gene set analysis methods; however, they often rely on oversimplifying assumptions that make them biased in favor of or against a given method. In this paper, we propose a quantitative framework for evaluation of gene set analysis methods by synthesizing expression datasets using real data, without relying on oversimplifying or unrealistic assumptions, while preserving complex gene–gene correlations and retaining the distribution of expression values. The utility of the quantitative approach is shown by evaluating ten widely used gene set analysis methods. An implementation of the proposed method is publicly available. We suggest using Silver to evaluate existing and new gene set analysis methods. Evaluation using Silver provides a better understanding of current methods and can aid in the development of gene set analysis methods to achieve higher specificity without sacrificing sensitivity. MDPI 2021-09-28 /pmc/articles/PMC8535810/ /pubmed/34680918 http://dx.doi.org/10.3390/genes12101523 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Maleki, Farhad
Ovens, Katie
McQuillan, Ian
Kusalik, Anthony J.
Silver: Forging almost Gold Standard Datasets
title Silver: Forging almost Gold Standard Datasets
title_full Silver: Forging almost Gold Standard Datasets
title_fullStr Silver: Forging almost Gold Standard Datasets
title_full_unstemmed Silver: Forging almost Gold Standard Datasets
title_short Silver: Forging almost Gold Standard Datasets
title_sort silver: forging almost gold standard datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8535810/
https://www.ncbi.nlm.nih.gov/pubmed/34680918
http://dx.doi.org/10.3390/genes12101523
work_keys_str_mv AT malekifarhad silverforgingalmostgoldstandarddatasets
AT ovenskatie silverforgingalmostgoldstandarddatasets
AT mcquillanian silverforgingalmostgoldstandarddatasets
AT kusalikanthonyj silverforgingalmostgoldstandarddatasets