Cargando…

Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges

Over the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results...

Descripción completa

Detalles Bibliográficos
Autores principales: Das, Samarendra, McClain, Craig J., Rai, Shesh N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516904/
https://www.ncbi.nlm.nih.gov/pubmed/33286201
http://dx.doi.org/10.3390/e22040427
_version_ 1783587106000470016
author Das, Samarendra
McClain, Craig J.
Rai, Shesh N.
author_facet Das, Samarendra
McClain, Craig J.
Rai, Shesh N.
author_sort Das, Samarendra
collection PubMed
description Over the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Although gene set analysis approaches are extensively used in gene expression and genome wide association data analysis, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. In this article, we provide a comprehensive overview, statistical structure and steps of gene set analysis approaches used for microarrays, RNA-sequencing and genome wide association data analysis. Further, we also classify the gene set analysis approaches and tools by the type of genomic study, null hypothesis, sampling model and nature of the test statistic, etc. Rather than reviewing the gene set analysis approaches individually, we provide the generation-wise evolution of such approaches for microarrays, RNA-sequencing and genome wide association studies and discuss their relative merits and limitations. Here, we identify the key biological and statistical challenges in current gene set analysis, which will be addressed by statisticians and biologists collectively in order to develop the next generation of gene set analysis approaches. Further, this study will serve as a catalog and provide guidelines to genome researchers and experimental biologists for choosing the proper gene set analysis approach based on several factors.
format Online
Article
Text
id pubmed-7516904
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75169042020-11-09 Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges Das, Samarendra McClain, Craig J. Rai, Shesh N. Entropy (Basel) Review Over the last decade, gene set analysis has become the first choice for gaining insights into underlying complex biology of diseases through gene expression and gene association studies. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Although gene set analysis approaches are extensively used in gene expression and genome wide association data analysis, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. In this article, we provide a comprehensive overview, statistical structure and steps of gene set analysis approaches used for microarrays, RNA-sequencing and genome wide association data analysis. Further, we also classify the gene set analysis approaches and tools by the type of genomic study, null hypothesis, sampling model and nature of the test statistic, etc. Rather than reviewing the gene set analysis approaches individually, we provide the generation-wise evolution of such approaches for microarrays, RNA-sequencing and genome wide association studies and discuss their relative merits and limitations. Here, we identify the key biological and statistical challenges in current gene set analysis, which will be addressed by statisticians and biologists collectively in order to develop the next generation of gene set analysis approaches. Further, this study will serve as a catalog and provide guidelines to genome researchers and experimental biologists for choosing the proper gene set analysis approach based on several factors. MDPI 2020-04-10 /pmc/articles/PMC7516904/ /pubmed/33286201 http://dx.doi.org/10.3390/e22040427 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Review
Das, Samarendra
McClain, Craig J.
Rai, Shesh N.
Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges
title Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges
title_full Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges
title_fullStr Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges
title_full_unstemmed Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges
title_short Fifteen Years of Gene Set Analysis for High-Throughput Genomic Data: A Review of Statistical Approaches and Future Challenges
title_sort fifteen years of gene set analysis for high-throughput genomic data: a review of statistical approaches and future challenges
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7516904/
https://www.ncbi.nlm.nih.gov/pubmed/33286201
http://dx.doi.org/10.3390/e22040427
work_keys_str_mv AT dassamarendra fifteenyearsofgenesetanalysisforhighthroughputgenomicdataareviewofstatisticalapproachesandfuturechallenges
AT mcclaincraigj fifteenyearsofgenesetanalysisforhighthroughputgenomicdataareviewofstatisticalapproachesandfuturechallenges
AT raisheshn fifteenyearsofgenesetanalysisforhighthroughputgenomicdataareviewofstatisticalapproachesandfuturechallenges