Cargando…

ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions

The increasing number of genome-wide assays of gene expression available from public databases presents opportunities for computational methods that facilitate hypothesis generation and biological interpretation of these data. We present an unsupervised machine learning approach, ADAGE (analysis usi...

Descripción completa

Detalles Bibliográficos
Autores principales: Tan, Jie, Hammond, John H., Hogan, Deborah A., Greene, Casey S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society of Microbiology 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5069748/
https://www.ncbi.nlm.nih.gov/pubmed/27822512
http://dx.doi.org/10.1128/mSystems.00025-15
_version_ 1782460994187952128
author Tan, Jie
Hammond, John H.
Hogan, Deborah A.
Greene, Casey S.
author_facet Tan, Jie
Hammond, John H.
Hogan, Deborah A.
Greene, Casey S.
author_sort Tan, Jie
collection PubMed
description The increasing number of genome-wide assays of gene expression available from public databases presents opportunities for computational methods that facilitate hypothesis generation and biological interpretation of these data. We present an unsupervised machine learning approach, ADAGE (analysis using denoising autoencoders of gene expression), and apply it to the publicly available gene expression data compendium for Pseudomonas aeruginosa. In this approach, the machine-learned ADAGE model contained 50 nodes which we predicted would correspond to gene expression patterns across the gene expression compendium. While no biological knowledge was used during model construction, cooperonic genes had similar weights across nodes, and genes with similar weights across nodes were significantly more likely to share KEGG pathways. By analyzing newly generated and previously published microarray and transcriptome sequencing data, the ADAGE model identified differences between strains, modeled the cellular response to low oxygen, and predicted the involvement of biological processes based on low-level gene expression differences. ADAGE compared favorably with traditional principal component analysis and independent component analysis approaches in its ability to extract validated patterns, and based on our analyses, we propose that these approaches differ in the types of patterns they preferentially identify. We provide the ADAGE model with analysis of all publicly available P. aeruginosa GeneChip experiments and open source code for use with other species and settings. Extraction of consistent patterns across large-scale collections of genomic data using methods like ADAGE provides the opportunity to identify general principles and biologically important patterns in microbial biology. This approach will be particularly useful in less-well-studied microbial species. IMPORTANCE The quantity and breadth of genome-scale data sets that examine RNA expression in diverse bacterial and eukaryotic species are increasing more rapidly than for curated knowledge. Our ADAGE method integrates such data without requiring gene function, gene pathway, or experiment labeling, making practical its application to any large gene expression compendium. We built a Pseudomonas aeruginosa ADAGE model from a diverse set of publicly available experiments without any prespecified biological knowledge, and this model was accurate and predictive. We provide ADAGE results for the complete P. aeruginosa GeneChip compendium for use by researchers studying P. aeruginosa and source code that facilitates ADAGE’s application to other species and data types. Author Video: An author video summary of this article is available.
format Online
Article
Text
id pubmed-5069748
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher American Society of Microbiology
record_format MEDLINE/PubMed
spelling pubmed-50697482016-11-07 ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions Tan, Jie Hammond, John H. Hogan, Deborah A. Greene, Casey S. mSystems Research Article The increasing number of genome-wide assays of gene expression available from public databases presents opportunities for computational methods that facilitate hypothesis generation and biological interpretation of these data. We present an unsupervised machine learning approach, ADAGE (analysis using denoising autoencoders of gene expression), and apply it to the publicly available gene expression data compendium for Pseudomonas aeruginosa. In this approach, the machine-learned ADAGE model contained 50 nodes which we predicted would correspond to gene expression patterns across the gene expression compendium. While no biological knowledge was used during model construction, cooperonic genes had similar weights across nodes, and genes with similar weights across nodes were significantly more likely to share KEGG pathways. By analyzing newly generated and previously published microarray and transcriptome sequencing data, the ADAGE model identified differences between strains, modeled the cellular response to low oxygen, and predicted the involvement of biological processes based on low-level gene expression differences. ADAGE compared favorably with traditional principal component analysis and independent component analysis approaches in its ability to extract validated patterns, and based on our analyses, we propose that these approaches differ in the types of patterns they preferentially identify. We provide the ADAGE model with analysis of all publicly available P. aeruginosa GeneChip experiments and open source code for use with other species and settings. Extraction of consistent patterns across large-scale collections of genomic data using methods like ADAGE provides the opportunity to identify general principles and biologically important patterns in microbial biology. This approach will be particularly useful in less-well-studied microbial species. IMPORTANCE The quantity and breadth of genome-scale data sets that examine RNA expression in diverse bacterial and eukaryotic species are increasing more rapidly than for curated knowledge. Our ADAGE method integrates such data without requiring gene function, gene pathway, or experiment labeling, making practical its application to any large gene expression compendium. We built a Pseudomonas aeruginosa ADAGE model from a diverse set of publicly available experiments without any prespecified biological knowledge, and this model was accurate and predictive. We provide ADAGE results for the complete P. aeruginosa GeneChip compendium for use by researchers studying P. aeruginosa and source code that facilitates ADAGE’s application to other species and data types. Author Video: An author video summary of this article is available. American Society of Microbiology 2016-01-19 /pmc/articles/PMC5069748/ /pubmed/27822512 http://dx.doi.org/10.1128/mSystems.00025-15 Text en Copyright © 2016 Tan et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (http://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Tan, Jie
Hammond, John H.
Hogan, Deborah A.
Greene, Casey S.
ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions
title ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions
title_full ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions
title_fullStr ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions
title_full_unstemmed ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions
title_short ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions
title_sort adage-based integration of publicly available pseudomonas aeruginosa gene expression data with denoising autoencoders illuminates microbe-host interactions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5069748/
https://www.ncbi.nlm.nih.gov/pubmed/27822512
http://dx.doi.org/10.1128/mSystems.00025-15
work_keys_str_mv AT tanjie adagebasedintegrationofpubliclyavailablepseudomonasaeruginosageneexpressiondatawithdenoisingautoencodersilluminatesmicrobehostinteractions
AT hammondjohnh adagebasedintegrationofpubliclyavailablepseudomonasaeruginosageneexpressiondatawithdenoisingautoencodersilluminatesmicrobehostinteractions
AT hogandeboraha adagebasedintegrationofpubliclyavailablepseudomonasaeruginosageneexpressiondatawithdenoisingautoencodersilluminatesmicrobehostinteractions
AT greenecaseys adagebasedintegrationofpubliclyavailablepseudomonasaeruginosageneexpressiondatawithdenoisingautoencodersilluminatesmicrobehostinteractions