Cargando…

Evaluating metagenomic assembly approaches for biome-specific gene catalogues

BACKGROUND: For many environments, biome-specific microbial gene catalogues are being recovered using shotgun metagenomics followed by assembly and gene calling on the assembled contigs. The assembly is typically conducted either by individually assembling each sample or by co-assembling reads from...

Descripción completa

Detalles Bibliográficos
Autores principales: Delgado, Luis Fernando, Andersson, Anders F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9074274/
https://www.ncbi.nlm.nih.gov/pubmed/35524337
http://dx.doi.org/10.1186/s40168-022-01259-2
_version_ 1784701449045803008
author Delgado, Luis Fernando
Andersson, Anders F.
author_facet Delgado, Luis Fernando
Andersson, Anders F.
author_sort Delgado, Luis Fernando
collection PubMed
description BACKGROUND: For many environments, biome-specific microbial gene catalogues are being recovered using shotgun metagenomics followed by assembly and gene calling on the assembled contigs. The assembly is typically conducted either by individually assembling each sample or by co-assembling reads from all the samples. The co-assembly approach can potentially recover genes that display too low abundance to be assembled from individual samples. On the other hand, combining samples increases the risk of mixing data from closely related strains, which can hamper the assembly process. In this respect, assembly on individual samples followed by clustering of (near) identical genes is preferable. Thus, both approaches have potential pros and cons, but it remains to be evaluated which assembly strategy is most effective. Here, we have evaluated three assembly strategies for generating gene catalogues from metagenomes using a dataset of 124 samples from the Baltic Sea: (1) assembly on individual samples followed by clustering of the resulting genes, (2) co-assembly on all samples, and (3) mix assembly, combining individual and co-assembly. RESULTS: The mix-assembly approach resulted in a more extensive nonredundant gene set than the other approaches and with more genes predicted to be complete and that could be functionally annotated. The mix assembly consists of 67 million genes (Baltic Sea gene set, BAGS) that have been functionally and taxonomically annotated. The majority of the BAGS genes are dissimilar (< 95% amino acid identity) to the Tara Oceans gene dataset, and hence, BAGS represents a valuable resource for brackish water research. CONCLUSION: The mix-assembly approach represents a feasible approach to increase the information obtained from metagenomic samples. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-022-01259-2.
format Online
Article
Text
id pubmed-9074274
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-90742742022-05-07 Evaluating metagenomic assembly approaches for biome-specific gene catalogues Delgado, Luis Fernando Andersson, Anders F. Microbiome Research BACKGROUND: For many environments, biome-specific microbial gene catalogues are being recovered using shotgun metagenomics followed by assembly and gene calling on the assembled contigs. The assembly is typically conducted either by individually assembling each sample or by co-assembling reads from all the samples. The co-assembly approach can potentially recover genes that display too low abundance to be assembled from individual samples. On the other hand, combining samples increases the risk of mixing data from closely related strains, which can hamper the assembly process. In this respect, assembly on individual samples followed by clustering of (near) identical genes is preferable. Thus, both approaches have potential pros and cons, but it remains to be evaluated which assembly strategy is most effective. Here, we have evaluated three assembly strategies for generating gene catalogues from metagenomes using a dataset of 124 samples from the Baltic Sea: (1) assembly on individual samples followed by clustering of the resulting genes, (2) co-assembly on all samples, and (3) mix assembly, combining individual and co-assembly. RESULTS: The mix-assembly approach resulted in a more extensive nonredundant gene set than the other approaches and with more genes predicted to be complete and that could be functionally annotated. The mix assembly consists of 67 million genes (Baltic Sea gene set, BAGS) that have been functionally and taxonomically annotated. The majority of the BAGS genes are dissimilar (< 95% amino acid identity) to the Tara Oceans gene dataset, and hence, BAGS represents a valuable resource for brackish water research. CONCLUSION: The mix-assembly approach represents a feasible approach to increase the information obtained from metagenomic samples. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-022-01259-2. BioMed Central 2022-05-06 /pmc/articles/PMC9074274/ /pubmed/35524337 http://dx.doi.org/10.1186/s40168-022-01259-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Delgado, Luis Fernando
Andersson, Anders F.
Evaluating metagenomic assembly approaches for biome-specific gene catalogues
title Evaluating metagenomic assembly approaches for biome-specific gene catalogues
title_full Evaluating metagenomic assembly approaches for biome-specific gene catalogues
title_fullStr Evaluating metagenomic assembly approaches for biome-specific gene catalogues
title_full_unstemmed Evaluating metagenomic assembly approaches for biome-specific gene catalogues
title_short Evaluating metagenomic assembly approaches for biome-specific gene catalogues
title_sort evaluating metagenomic assembly approaches for biome-specific gene catalogues
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9074274/
https://www.ncbi.nlm.nih.gov/pubmed/35524337
http://dx.doi.org/10.1186/s40168-022-01259-2
work_keys_str_mv AT delgadoluisfernando evaluatingmetagenomicassemblyapproachesforbiomespecificgenecatalogues
AT anderssonandersf evaluatingmetagenomicassemblyapproachesforbiomespecificgenecatalogues