Cargando…

GEOlimma: differential expression analysis and feature selection using pre-existing microarray data

BACKGROUND: Differential expression and feature selection analyses are essential steps for the development of accurate diagnostic/prognostic classifiers of complicated human diseases using transcriptomics data. These steps are particularly challenging due to the curse of dimensionality and the prese...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Liangqun, Townsend, Kevin A., Daigle, Bernie J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7860207/
https://www.ncbi.nlm.nih.gov/pubmed/33535967
http://dx.doi.org/10.1186/s12859-020-03932-5
_version_ 1783646893964787712
author Lu, Liangqun
Townsend, Kevin A.
Daigle, Bernie J.
author_facet Lu, Liangqun
Townsend, Kevin A.
Daigle, Bernie J.
author_sort Lu, Liangqun
collection PubMed
description BACKGROUND: Differential expression and feature selection analyses are essential steps for the development of accurate diagnostic/prognostic classifiers of complicated human diseases using transcriptomics data. These steps are particularly challenging due to the curse of dimensionality and the presence of technical and biological noise. A promising strategy for overcoming these challenges is the incorporation of pre-existing transcriptomics data in the identification of differentially expressed (DE) genes. This approach has the potential to improve the quality of selected genes, increase classification performance, and enhance biological interpretability. While a number of methods have been developed that use pre-existing data for differential expression analysis, existing methods do not leverage the identities of experimental conditions to create a robust metric for identifying DE genes. RESULTS: In this study, we propose a novel differential expression and feature selection method—GEOlimma—which combines pre-existing microarray data from the Gene Expression Omnibus (GEO) with the widely-applied Limma method for differential expression analysis. We first quantify differential gene expression across 2481 pairwise comparisons from 602 curated GEO Datasets, and we convert differential expression frequencies to DE prior probabilities. Genes with high DE prior probabilities show enrichment in cell growth and death, signal transduction, and cancer-related biological pathways, while genes with low prior probabilities were enriched in sensory system pathways. We then applied GEOlimma to four differential expression comparisons within two human disease datasets and performed differential expression, feature selection, and supervised classification analyses. Our results suggest that use of GEOlimma provides greater experimental power to detect DE genes compared to Limma, due to its increased effective sample size. Furthermore, in a supervised classification analysis using GEOlimma as a feature selection method, we observed similar or better classification performance than Limma given small, noisy subsets of an asthma dataset. CONCLUSIONS: Our results demonstrate that GEOlimma is a more effective method for differential gene expression and feature selection analyses compared to the standard Limma method. Due to its focus on gene-level differential expression, GEOlimma also has the potential to be applied to other high-throughput biological datasets.
format Online
Article
Text
id pubmed-7860207
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-78602072021-02-05 GEOlimma: differential expression analysis and feature selection using pre-existing microarray data Lu, Liangqun Townsend, Kevin A. Daigle, Bernie J. BMC Bioinformatics Methodology Article BACKGROUND: Differential expression and feature selection analyses are essential steps for the development of accurate diagnostic/prognostic classifiers of complicated human diseases using transcriptomics data. These steps are particularly challenging due to the curse of dimensionality and the presence of technical and biological noise. A promising strategy for overcoming these challenges is the incorporation of pre-existing transcriptomics data in the identification of differentially expressed (DE) genes. This approach has the potential to improve the quality of selected genes, increase classification performance, and enhance biological interpretability. While a number of methods have been developed that use pre-existing data for differential expression analysis, existing methods do not leverage the identities of experimental conditions to create a robust metric for identifying DE genes. RESULTS: In this study, we propose a novel differential expression and feature selection method—GEOlimma—which combines pre-existing microarray data from the Gene Expression Omnibus (GEO) with the widely-applied Limma method for differential expression analysis. We first quantify differential gene expression across 2481 pairwise comparisons from 602 curated GEO Datasets, and we convert differential expression frequencies to DE prior probabilities. Genes with high DE prior probabilities show enrichment in cell growth and death, signal transduction, and cancer-related biological pathways, while genes with low prior probabilities were enriched in sensory system pathways. We then applied GEOlimma to four differential expression comparisons within two human disease datasets and performed differential expression, feature selection, and supervised classification analyses. Our results suggest that use of GEOlimma provides greater experimental power to detect DE genes compared to Limma, due to its increased effective sample size. Furthermore, in a supervised classification analysis using GEOlimma as a feature selection method, we observed similar or better classification performance than Limma given small, noisy subsets of an asthma dataset. CONCLUSIONS: Our results demonstrate that GEOlimma is a more effective method for differential gene expression and feature selection analyses compared to the standard Limma method. Due to its focus on gene-level differential expression, GEOlimma also has the potential to be applied to other high-throughput biological datasets. BioMed Central 2021-02-03 /pmc/articles/PMC7860207/ /pubmed/33535967 http://dx.doi.org/10.1186/s12859-020-03932-5 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Lu, Liangqun
Townsend, Kevin A.
Daigle, Bernie J.
GEOlimma: differential expression analysis and feature selection using pre-existing microarray data
title GEOlimma: differential expression analysis and feature selection using pre-existing microarray data
title_full GEOlimma: differential expression analysis and feature selection using pre-existing microarray data
title_fullStr GEOlimma: differential expression analysis and feature selection using pre-existing microarray data
title_full_unstemmed GEOlimma: differential expression analysis and feature selection using pre-existing microarray data
title_short GEOlimma: differential expression analysis and feature selection using pre-existing microarray data
title_sort geolimma: differential expression analysis and feature selection using pre-existing microarray data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7860207/
https://www.ncbi.nlm.nih.gov/pubmed/33535967
http://dx.doi.org/10.1186/s12859-020-03932-5
work_keys_str_mv AT luliangqun geolimmadifferentialexpressionanalysisandfeatureselectionusingpreexistingmicroarraydata
AT townsendkevina geolimmadifferentialexpressionanalysisandfeatureselectionusingpreexistingmicroarraydata
AT daigleberniej geolimmadifferentialexpressionanalysisandfeatureselectionusingpreexistingmicroarraydata