Cargando…

ADAGE signature analysis: differential expression analysis with data-defined gene sets

BACKGROUND: Gene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their avai...

Descripción completa

Detalles Bibliográficos
Autores principales: Tan, Jie, Huyck, Matthew, Hu, Dongbo, Zelaya, René A., Hogan, Deborah A., Greene, Casey S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5700673/
https://www.ncbi.nlm.nih.gov/pubmed/29166858
http://dx.doi.org/10.1186/s12859-017-1905-4
_version_ 1783281174537306112
author Tan, Jie
Huyck, Matthew
Hu, Dongbo
Zelaya, René A.
Hogan, Deborah A.
Greene, Casey S.
author_facet Tan, Jie
Huyck, Matthew
Hu, Dongbo
Zelaya, René A.
Hogan, Deborah A.
Greene, Casey S.
author_sort Tan, Jie
collection PubMed
description BACKGROUND: Gene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data. RESULTS: Here we introduce a method to identify perturbed processes. In contrast with methods that use curated gene sets, this approach uses signatures extracted from public expression data. We first extract expression signatures from public data using ADAGE, a neural network-based feature extraction approach. We next identify signatures that are differentially active under a given treatment. Our results demonstrate that these signatures represent biological processes that are perturbed by the experiment. Because these signatures are directly learned from data without supervision, they can identify uncurated or novel biological processes. We implemented ADAGE signature analysis for the bacterial pathogen Pseudomonas aeruginosa. For the convenience of different user groups, we implemented both an R package (ADAGEpath) and a web server (http://adage.greenelab.com) to run these analyses. Both are open-source to allow easy expansion to other organisms or signature generation methods. We applied ADAGE signature analysis to an example dataset in which wild-type and ∆anr mutant cells were grown as biofilms on the Cystic Fibrosis genotype bronchial epithelial cells. We mapped active signatures in the dataset to KEGG pathways and compared with pathways identified using GSEA. The two approaches generally return consistent results; however, ADAGE signature analysis also identified a signature that revealed the molecularly supported link between the MexT regulon and Anr. CONCLUSIONS: We designed ADAGE signature analysis to perform gene set analysis using data-defined functional gene signatures. This approach addresses an important gap for biologists studying non-traditional model organisms and those without extensive curated resources available. We built both an R package and web server to provide ADAGE signature analysis to the community. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1905-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5700673
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57006732017-12-01 ADAGE signature analysis: differential expression analysis with data-defined gene sets Tan, Jie Huyck, Matthew Hu, Dongbo Zelaya, René A. Hogan, Deborah A. Greene, Casey S. BMC Bioinformatics Methodology Article BACKGROUND: Gene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data. RESULTS: Here we introduce a method to identify perturbed processes. In contrast with methods that use curated gene sets, this approach uses signatures extracted from public expression data. We first extract expression signatures from public data using ADAGE, a neural network-based feature extraction approach. We next identify signatures that are differentially active under a given treatment. Our results demonstrate that these signatures represent biological processes that are perturbed by the experiment. Because these signatures are directly learned from data without supervision, they can identify uncurated or novel biological processes. We implemented ADAGE signature analysis for the bacterial pathogen Pseudomonas aeruginosa. For the convenience of different user groups, we implemented both an R package (ADAGEpath) and a web server (http://adage.greenelab.com) to run these analyses. Both are open-source to allow easy expansion to other organisms or signature generation methods. We applied ADAGE signature analysis to an example dataset in which wild-type and ∆anr mutant cells were grown as biofilms on the Cystic Fibrosis genotype bronchial epithelial cells. We mapped active signatures in the dataset to KEGG pathways and compared with pathways identified using GSEA. The two approaches generally return consistent results; however, ADAGE signature analysis also identified a signature that revealed the molecularly supported link between the MexT regulon and Anr. CONCLUSIONS: We designed ADAGE signature analysis to perform gene set analysis using data-defined functional gene signatures. This approach addresses an important gap for biologists studying non-traditional model organisms and those without extensive curated resources available. We built both an R package and web server to provide ADAGE signature analysis to the community. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1905-4) contains supplementary material, which is available to authorized users. BioMed Central 2017-11-22 /pmc/articles/PMC5700673/ /pubmed/29166858 http://dx.doi.org/10.1186/s12859-017-1905-4 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Tan, Jie
Huyck, Matthew
Hu, Dongbo
Zelaya, René A.
Hogan, Deborah A.
Greene, Casey S.
ADAGE signature analysis: differential expression analysis with data-defined gene sets
title ADAGE signature analysis: differential expression analysis with data-defined gene sets
title_full ADAGE signature analysis: differential expression analysis with data-defined gene sets
title_fullStr ADAGE signature analysis: differential expression analysis with data-defined gene sets
title_full_unstemmed ADAGE signature analysis: differential expression analysis with data-defined gene sets
title_short ADAGE signature analysis: differential expression analysis with data-defined gene sets
title_sort adage signature analysis: differential expression analysis with data-defined gene sets
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5700673/
https://www.ncbi.nlm.nih.gov/pubmed/29166858
http://dx.doi.org/10.1186/s12859-017-1905-4
work_keys_str_mv AT tanjie adagesignatureanalysisdifferentialexpressionanalysiswithdatadefinedgenesets
AT huyckmatthew adagesignatureanalysisdifferentialexpressionanalysiswithdatadefinedgenesets
AT hudongbo adagesignatureanalysisdifferentialexpressionanalysiswithdatadefinedgenesets
AT zelayarenea adagesignatureanalysisdifferentialexpressionanalysiswithdatadefinedgenesets
AT hogandeboraha adagesignatureanalysisdifferentialexpressionanalysiswithdatadefinedgenesets
AT greenecaseys adagesignatureanalysisdifferentialexpressionanalysiswithdatadefinedgenesets