Cargando…

GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses

The NCBI Gene Expression Omnibus (GEO) provides tools to query and download transcriptomic data. However, less than 4% of microbial experiments include the sample group annotations required to assess differential gene expression for high-throughput reanalysis, and data deposited after 2014 universal...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Zhongyou, Koeppen, Katja, Holden, Victoria I., Neff, Samuel L., Cengher, Liviu, Demers, Elora G., Mould, Dallas L., Stanton, Bruce A., Hampton, Thomas H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8547006/
https://www.ncbi.nlm.nih.gov/pubmed/33758032
http://dx.doi.org/10.1128/mSystems.01305-20
_version_ 1784590301717856256
author Li, Zhongyou
Koeppen, Katja
Holden, Victoria I.
Neff, Samuel L.
Cengher, Liviu
Demers, Elora G.
Mould, Dallas L.
Stanton, Bruce A.
Hampton, Thomas H.
author_facet Li, Zhongyou
Koeppen, Katja
Holden, Victoria I.
Neff, Samuel L.
Cengher, Liviu
Demers, Elora G.
Mould, Dallas L.
Stanton, Bruce A.
Hampton, Thomas H.
author_sort Li, Zhongyou
collection PubMed
description The NCBI Gene Expression Omnibus (GEO) provides tools to query and download transcriptomic data. However, less than 4% of microbial experiments include the sample group annotations required to assess differential gene expression for high-throughput reanalysis, and data deposited after 2014 universally lack these annotations. Our algorithm GAUGE (general annotation using text/data group ensembles) automatically annotates GEO microbial data sets, including microarray and RNA sequencing studies, increasing the percentage of data sets amenable to analysis from 4% to 33%. Eighty-nine percent of GAUGE-annotated studies matched group assignments generated by human curators. To demonstrate how GAUGE annotation can lead to scientific insight, we created GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface to analyze 73 GAUGE-annotated P. aeruginosa studies, three times more than previously available. GAPE analysis revealed that PA3923, a gene of unknown function, was frequently differentially expressed in more than 50% of studies and significantly coregulated with genes involved in biofilm formation. Follow-up wet-bench experiments demonstrate that PA3923 mutants are indeed defective in biofilm formation, consistent with predictions facilitated by GAUGE and GAPE. We anticipate that GAUGE and GAPE, which we have made freely available, will make publicly available microbial transcriptomic data easier to reuse and lead to new data-driven hypotheses. IMPORTANCE GEO archives transcriptomic data from over 5,800 microbial experiments and allows researchers to answer questions not directly addressed in published papers. However, less than 4% of the microbial data sets include the sample group annotations required for high-throughput reanalysis. This limitation blocks a considerable amount of microbial transcriptomic data from being reused easily. Here, we demonstrate that the GAUGE algorithm could make 33% of microbial data accessible to parallel mining and reanalysis. GAUGE annotations increase statistical power and, thereby, make consistent patterns of differential gene expression easier to identify. In addition, we developed GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface that performs parallel analyses on P. aeruginosa and E. coli compendia. Source code for GAUGE and GAPE is freely available and can be repurposed to create compendia for other bacterial species. Author Video: An author video summary of this article is available.
format Online
Article
Text
id pubmed-8547006
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-85470062021-10-27 GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses Li, Zhongyou Koeppen, Katja Holden, Victoria I. Neff, Samuel L. Cengher, Liviu Demers, Elora G. Mould, Dallas L. Stanton, Bruce A. Hampton, Thomas H. mSystems Research Article The NCBI Gene Expression Omnibus (GEO) provides tools to query and download transcriptomic data. However, less than 4% of microbial experiments include the sample group annotations required to assess differential gene expression for high-throughput reanalysis, and data deposited after 2014 universally lack these annotations. Our algorithm GAUGE (general annotation using text/data group ensembles) automatically annotates GEO microbial data sets, including microarray and RNA sequencing studies, increasing the percentage of data sets amenable to analysis from 4% to 33%. Eighty-nine percent of GAUGE-annotated studies matched group assignments generated by human curators. To demonstrate how GAUGE annotation can lead to scientific insight, we created GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface to analyze 73 GAUGE-annotated P. aeruginosa studies, three times more than previously available. GAPE analysis revealed that PA3923, a gene of unknown function, was frequently differentially expressed in more than 50% of studies and significantly coregulated with genes involved in biofilm formation. Follow-up wet-bench experiments demonstrate that PA3923 mutants are indeed defective in biofilm formation, consistent with predictions facilitated by GAUGE and GAPE. We anticipate that GAUGE and GAPE, which we have made freely available, will make publicly available microbial transcriptomic data easier to reuse and lead to new data-driven hypotheses. IMPORTANCE GEO archives transcriptomic data from over 5,800 microbial experiments and allows researchers to answer questions not directly addressed in published papers. However, less than 4% of the microbial data sets include the sample group annotations required for high-throughput reanalysis. This limitation blocks a considerable amount of microbial transcriptomic data from being reused easily. Here, we demonstrate that the GAUGE algorithm could make 33% of microbial data accessible to parallel mining and reanalysis. GAUGE annotations increase statistical power and, thereby, make consistent patterns of differential gene expression easier to identify. In addition, we developed GAPE (GAUGE-annotated Pseudomonas aeruginosa and Escherichia coli transcriptomic compendia for reanalysis), a Shiny Web interface that performs parallel analyses on P. aeruginosa and E. coli compendia. Source code for GAUGE and GAPE is freely available and can be repurposed to create compendia for other bacterial species. Author Video: An author video summary of this article is available. American Society for Microbiology 2021-03-23 /pmc/articles/PMC8547006/ /pubmed/33758032 http://dx.doi.org/10.1128/mSystems.01305-20 Text en Copyright © 2021 Li et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Li, Zhongyou
Koeppen, Katja
Holden, Victoria I.
Neff, Samuel L.
Cengher, Liviu
Demers, Elora G.
Mould, Dallas L.
Stanton, Bruce A.
Hampton, Thomas H.
GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses
title GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses
title_full GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses
title_fullStr GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses
title_full_unstemmed GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses
title_short GAUGE-Annotated Microbial Transcriptomic Data Facilitate Parallel Mining and High-Throughput Reanalysis To Form Data-Driven Hypotheses
title_sort gauge-annotated microbial transcriptomic data facilitate parallel mining and high-throughput reanalysis to form data-driven hypotheses
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8547006/
https://www.ncbi.nlm.nih.gov/pubmed/33758032
http://dx.doi.org/10.1128/mSystems.01305-20
work_keys_str_mv AT lizhongyou gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT koeppenkatja gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT holdenvictoriai gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT neffsamuell gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT cengherliviu gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT demerselorag gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT moulddallasl gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT stantonbrucea gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses
AT hamptonthomash gaugeannotatedmicrobialtranscriptomicdatafacilitateparallelminingandhighthroughputreanalysistoformdatadrivenhypotheses