Cargando…

Biases in genome reconstruction from metagenomic data

BACKGROUND: Advances in sequencing, assembly, and assortment of contigs into species-specific bins has enabled the reconstruction of genomes from metagenomic data (MAGs). Though a powerful technique, it is difficult to determine whether assembly and binning techniques are accurate when applied to en...

Descripción completa

Detalles Bibliográficos
Autores principales: Nelson, William C., Tully, Benjamin J., Mobberley, Jennifer M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7605220/
https://www.ncbi.nlm.nih.gov/pubmed/33194386
http://dx.doi.org/10.7717/peerj.10119
_version_ 1783604269894598656
author Nelson, William C.
Tully, Benjamin J.
Mobberley, Jennifer M.
author_facet Nelson, William C.
Tully, Benjamin J.
Mobberley, Jennifer M.
author_sort Nelson, William C.
collection PubMed
description BACKGROUND: Advances in sequencing, assembly, and assortment of contigs into species-specific bins has enabled the reconstruction of genomes from metagenomic data (MAGs). Though a powerful technique, it is difficult to determine whether assembly and binning techniques are accurate when applied to environmental metagenomes due to a lack of complete reference genome sequences against which to check the resulting MAGs. METHODS: We compared MAGs derived from an enrichment culture containing ~20 organisms to complete genome sequences of 10 organisms isolated from the enrichment culture. Factors commonly considered in binning software—nucleotide composition and sequence repetitiveness—were calculated for both the correctly binned and not-binned regions. This direct comparison revealed biases in sequence characteristics and gene content in the not-binned regions. Additionally, the composition of three public data sets representing MAGs reconstructed from the Tara Oceans metagenomic data was compared to a set of representative genomes available through NCBI RefSeq to verify that the biases identified were observable in more complex data sets and using three contemporary binning software packages. RESULTS: Repeat sequences were frequently not binned in the genome reconstruction processes, as were sequence regions with variant nucleotide composition. Genes encoded on the not-binned regions were strongly biased towards ribosomal RNAs, transfer RNAs, mobile element functions and genes of unknown function. Our results support genome reconstruction as a robust process and suggest that reconstructions determined to be >90% complete are likely to effectively represent organismal function; however, population-level genotypic heterogeneity in natural populations, such as uneven distribution of plasmids, can lead to incorrect inferences.
format Online
Article
Text
id pubmed-7605220
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-76052202020-11-12 Biases in genome reconstruction from metagenomic data Nelson, William C. Tully, Benjamin J. Mobberley, Jennifer M. PeerJ Bioinformatics BACKGROUND: Advances in sequencing, assembly, and assortment of contigs into species-specific bins has enabled the reconstruction of genomes from metagenomic data (MAGs). Though a powerful technique, it is difficult to determine whether assembly and binning techniques are accurate when applied to environmental metagenomes due to a lack of complete reference genome sequences against which to check the resulting MAGs. METHODS: We compared MAGs derived from an enrichment culture containing ~20 organisms to complete genome sequences of 10 organisms isolated from the enrichment culture. Factors commonly considered in binning software—nucleotide composition and sequence repetitiveness—were calculated for both the correctly binned and not-binned regions. This direct comparison revealed biases in sequence characteristics and gene content in the not-binned regions. Additionally, the composition of three public data sets representing MAGs reconstructed from the Tara Oceans metagenomic data was compared to a set of representative genomes available through NCBI RefSeq to verify that the biases identified were observable in more complex data sets and using three contemporary binning software packages. RESULTS: Repeat sequences were frequently not binned in the genome reconstruction processes, as were sequence regions with variant nucleotide composition. Genes encoded on the not-binned regions were strongly biased towards ribosomal RNAs, transfer RNAs, mobile element functions and genes of unknown function. Our results support genome reconstruction as a robust process and suggest that reconstructions determined to be >90% complete are likely to effectively represent organismal function; however, population-level genotypic heterogeneity in natural populations, such as uneven distribution of plasmids, can lead to incorrect inferences. PeerJ Inc. 2020-10-30 /pmc/articles/PMC7605220/ /pubmed/33194386 http://dx.doi.org/10.7717/peerj.10119 Text en © 2020 Nelson et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Nelson, William C.
Tully, Benjamin J.
Mobberley, Jennifer M.
Biases in genome reconstruction from metagenomic data
title Biases in genome reconstruction from metagenomic data
title_full Biases in genome reconstruction from metagenomic data
title_fullStr Biases in genome reconstruction from metagenomic data
title_full_unstemmed Biases in genome reconstruction from metagenomic data
title_short Biases in genome reconstruction from metagenomic data
title_sort biases in genome reconstruction from metagenomic data
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7605220/
https://www.ncbi.nlm.nih.gov/pubmed/33194386
http://dx.doi.org/10.7717/peerj.10119
work_keys_str_mv AT nelsonwilliamc biasesingenomereconstructionfrommetagenomicdata
AT tullybenjaminj biasesingenomereconstructionfrommetagenomicdata
AT mobberleyjenniferm biasesingenomereconstructionfrommetagenomicdata