Cargando…

MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data

BACKGROUND: With the advent of the age of big data in bioinformatics, large volumes of data and high-performance computing power enable researchers to perform re-analyses of publicly available datasets at an unprecedented scale. Ever more studies imply the microbiome in both normal human physiology...

Descripción completa

Detalles Bibliográficos
Autores principales: Simon, L M, Karg, S, Westermann, A J, Engel, M, Elbehery, A H A, Hense, B, Heinig, M, Deng, L, Theis, F J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6025204/
https://www.ncbi.nlm.nih.gov/pubmed/29901703
http://dx.doi.org/10.1093/gigascience/giy070
_version_ 1783336228658085888
author Simon, L M
Karg, S
Westermann, A J
Engel, M
Elbehery, A H A
Hense, B
Heinig, M
Deng, L
Theis, F J
author_facet Simon, L M
Karg, S
Westermann, A J
Engel, M
Elbehery, A H A
Hense, B
Heinig, M
Deng, L
Theis, F J
author_sort Simon, L M
collection PubMed
description BACKGROUND: With the advent of the age of big data in bioinformatics, large volumes of data and high-performance computing power enable researchers to perform re-analyses of publicly available datasets at an unprecedented scale. Ever more studies imply the microbiome in both normal human physiology and a wide range of diseases. RNA sequencing technology (RNA-seq) is commonly used to infer global eukaryotic gene expression patterns under defined conditions, including human disease-related contexts; however, its generic nature also enables the detection of microbial and viral transcripts. FINDINGS: We developed a bioinformatic pipeline to screen existing human RNA-seq datasets for the presence of microbial and viral reads by re-inspecting the non-human-mapping read fraction. We validated this approach by recapitulating outcomes from six independent, controlled infection experiments of cell line models and compared them with an alternative metatranscriptomic mapping strategy. We then applied the pipeline to close to 150 terabytes of publicly available raw RNA-seq data from  more than 17,000 samples from more than 400 studies relevant to human disease using state-of-the-art high-performance computing systems. The resulting data from this large-scale re-analysis are made available in the presented MetaMap resource. CONCLUSIONS: Our results demonstrate that common human RNA-seq data, including those archived in public repositories, might contain valuable information to correlate microbial and viral detection patterns with diverse diseases. The presented MetaMap database thus provides a rich resource for hypothesis generation toward the role of the microbiome in human disease. Additionally, codes to process new datasets and perform statistical analyses are made available.
format Online
Article
Text
id pubmed-6025204
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60252042018-07-10 MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data Simon, L M Karg, S Westermann, A J Engel, M Elbehery, A H A Hense, B Heinig, M Deng, L Theis, F J Gigascience Data Note BACKGROUND: With the advent of the age of big data in bioinformatics, large volumes of data and high-performance computing power enable researchers to perform re-analyses of publicly available datasets at an unprecedented scale. Ever more studies imply the microbiome in both normal human physiology and a wide range of diseases. RNA sequencing technology (RNA-seq) is commonly used to infer global eukaryotic gene expression patterns under defined conditions, including human disease-related contexts; however, its generic nature also enables the detection of microbial and viral transcripts. FINDINGS: We developed a bioinformatic pipeline to screen existing human RNA-seq datasets for the presence of microbial and viral reads by re-inspecting the non-human-mapping read fraction. We validated this approach by recapitulating outcomes from six independent, controlled infection experiments of cell line models and compared them with an alternative metatranscriptomic mapping strategy. We then applied the pipeline to close to 150 terabytes of publicly available raw RNA-seq data from  more than 17,000 samples from more than 400 studies relevant to human disease using state-of-the-art high-performance computing systems. The resulting data from this large-scale re-analysis are made available in the presented MetaMap resource. CONCLUSIONS: Our results demonstrate that common human RNA-seq data, including those archived in public repositories, might contain valuable information to correlate microbial and viral detection patterns with diverse diseases. The presented MetaMap database thus provides a rich resource for hypothesis generation toward the role of the microbiome in human disease. Additionally, codes to process new datasets and perform statistical analyses are made available. Oxford University Press 2018-06-12 /pmc/articles/PMC6025204/ /pubmed/29901703 http://dx.doi.org/10.1093/gigascience/giy070 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Data Note
Simon, L M
Karg, S
Westermann, A J
Engel, M
Elbehery, A H A
Hense, B
Heinig, M
Deng, L
Theis, F J
MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data
title MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data
title_full MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data
title_fullStr MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data
title_full_unstemmed MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data
title_short MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data
title_sort metamap: an atlas of metatranscriptomic reads in human disease-related rna-seq data
topic Data Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6025204/
https://www.ncbi.nlm.nih.gov/pubmed/29901703
http://dx.doi.org/10.1093/gigascience/giy070
work_keys_str_mv AT simonlm metamapanatlasofmetatranscriptomicreadsinhumandiseaserelatedrnaseqdata
AT kargs metamapanatlasofmetatranscriptomicreadsinhumandiseaserelatedrnaseqdata
AT westermannaj metamapanatlasofmetatranscriptomicreadsinhumandiseaserelatedrnaseqdata
AT engelm metamapanatlasofmetatranscriptomicreadsinhumandiseaserelatedrnaseqdata
AT elbeheryaha metamapanatlasofmetatranscriptomicreadsinhumandiseaserelatedrnaseqdata
AT henseb metamapanatlasofmetatranscriptomicreadsinhumandiseaserelatedrnaseqdata
AT heinigm metamapanatlasofmetatranscriptomicreadsinhumandiseaserelatedrnaseqdata
AT dengl metamapanatlasofmetatranscriptomicreadsinhumandiseaserelatedrnaseqdata
AT theisfj metamapanatlasofmetatranscriptomicreadsinhumandiseaserelatedrnaseqdata