Cargando…

HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data

BACKGROUND: A major bottleneck in the use of metagenome sequencing for human gut microbiome studies has been the lack of a comprehensive genome collection to be used as a reference database. Several recent efforts have been made to re-construct genomes from human gut metagenome data, resulting in a...

Descripción completa

Detalles Bibliográficos
Autores principales: Hiseni, Pranvera, Rudi, Knut, Wilson, Robert C., Hegge, Finn Terje, Snipen, Lars
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8325300/
https://www.ncbi.nlm.nih.gov/pubmed/34330336
http://dx.doi.org/10.1186/s40168-021-01114-w
_version_ 1783731538223955968
author Hiseni, Pranvera
Rudi, Knut
Wilson, Robert C.
Hegge, Finn Terje
Snipen, Lars
author_facet Hiseni, Pranvera
Rudi, Knut
Wilson, Robert C.
Hegge, Finn Terje
Snipen, Lars
author_sort Hiseni, Pranvera
collection PubMed
description BACKGROUND: A major bottleneck in the use of metagenome sequencing for human gut microbiome studies has been the lack of a comprehensive genome collection to be used as a reference database. Several recent efforts have been made to re-construct genomes from human gut metagenome data, resulting in a huge increase in the number of relevant genomes. In this work, we aimed to create a collection of the most prevalent healthy human gut prokaryotic genomes, to be used as a reference database, including both MAGs from the human gut and ordinary RefSeq genomes. RESULTS: We screened > 5,700 healthy human gut metagenomes for the containment of > 490,000 publicly available prokaryotic genomes sourced from RefSeq and the recently announced UHGG collection. This resulted in a pool of > 381,000 genomes that were subsequently scored and ranked based on their prevalence in the healthy human metagenomes. The genomes were then clustered at a 97.5% sequence identity resolution, and cluster representatives (30,691 in total) were retained to comprise the HumGut collection. Using the Kraken2 software for classification, we find superior performance in the assignment of metagenomic reads, classifying on average 94.5% of the reads in a metagenome, as opposed to 86% with UHGG and 44% when using standard Kraken2 database. A coarser HumGut collection, consisting of genomes dereplicated at 95% sequence identity—similar to UHGG, classified 88.25% of the reads. HumGut, half the size of standard Kraken2 database and directly comparable to the UHGG size, outperforms them both. CONCLUSIONS: The HumGut collection contains > 30,000 genomes clustered at a 97.5% sequence identity resolution and ranked by human gut prevalence. We demonstrate how metagenomes from IBD-patients map equally well to this collection, indicating this reference is relevant also for studies well outside the metagenome reference set used to obtain HumGut. All data and metadata, as well as helpful code, are available at http://arken.nmbu.no/~larssn/humgut/. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-021-01114-w.
format Online
Article
Text
id pubmed-8325300
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-83253002021-08-02 HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data Hiseni, Pranvera Rudi, Knut Wilson, Robert C. Hegge, Finn Terje Snipen, Lars Microbiome Research BACKGROUND: A major bottleneck in the use of metagenome sequencing for human gut microbiome studies has been the lack of a comprehensive genome collection to be used as a reference database. Several recent efforts have been made to re-construct genomes from human gut metagenome data, resulting in a huge increase in the number of relevant genomes. In this work, we aimed to create a collection of the most prevalent healthy human gut prokaryotic genomes, to be used as a reference database, including both MAGs from the human gut and ordinary RefSeq genomes. RESULTS: We screened > 5,700 healthy human gut metagenomes for the containment of > 490,000 publicly available prokaryotic genomes sourced from RefSeq and the recently announced UHGG collection. This resulted in a pool of > 381,000 genomes that were subsequently scored and ranked based on their prevalence in the healthy human metagenomes. The genomes were then clustered at a 97.5% sequence identity resolution, and cluster representatives (30,691 in total) were retained to comprise the HumGut collection. Using the Kraken2 software for classification, we find superior performance in the assignment of metagenomic reads, classifying on average 94.5% of the reads in a metagenome, as opposed to 86% with UHGG and 44% when using standard Kraken2 database. A coarser HumGut collection, consisting of genomes dereplicated at 95% sequence identity—similar to UHGG, classified 88.25% of the reads. HumGut, half the size of standard Kraken2 database and directly comparable to the UHGG size, outperforms them both. CONCLUSIONS: The HumGut collection contains > 30,000 genomes clustered at a 97.5% sequence identity resolution and ranked by human gut prevalence. We demonstrate how metagenomes from IBD-patients map equally well to this collection, indicating this reference is relevant also for studies well outside the metagenome reference set used to obtain HumGut. All data and metadata, as well as helpful code, are available at http://arken.nmbu.no/~larssn/humgut/. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-021-01114-w. BioMed Central 2021-07-31 /pmc/articles/PMC8325300/ /pubmed/34330336 http://dx.doi.org/10.1186/s40168-021-01114-w Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Hiseni, Pranvera
Rudi, Knut
Wilson, Robert C.
Hegge, Finn Terje
Snipen, Lars
HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data
title HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data
title_full HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data
title_fullStr HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data
title_full_unstemmed HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data
title_short HumGut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data
title_sort humgut: a comprehensive human gut prokaryotic genomes collection filtered by metagenome data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8325300/
https://www.ncbi.nlm.nih.gov/pubmed/34330336
http://dx.doi.org/10.1186/s40168-021-01114-w
work_keys_str_mv AT hisenipranvera humgutacomprehensivehumangutprokaryoticgenomescollectionfilteredbymetagenomedata
AT rudiknut humgutacomprehensivehumangutprokaryoticgenomescollectionfilteredbymetagenomedata
AT wilsonrobertc humgutacomprehensivehumangutprokaryoticgenomescollectionfilteredbymetagenomedata
AT heggefinnterje humgutacomprehensivehumangutprokaryoticgenomescollectionfilteredbymetagenomedata
AT snipenlars humgutacomprehensivehumangutprokaryoticgenomescollectionfilteredbymetagenomedata