Cargando…
Biases in genome reconstruction from metagenomic data
BACKGROUND: Advances in sequencing, assembly, and assortment of contigs into species-specific bins has enabled the reconstruction of genomes from metagenomic data (MAGs). Though a powerful technique, it is difficult to determine whether assembly and binning techniques are accurate when applied to en...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7605220/ https://www.ncbi.nlm.nih.gov/pubmed/33194386 http://dx.doi.org/10.7717/peerj.10119 |
_version_ | 1783604269894598656 |
---|---|
author | Nelson, William C. Tully, Benjamin J. Mobberley, Jennifer M. |
author_facet | Nelson, William C. Tully, Benjamin J. Mobberley, Jennifer M. |
author_sort | Nelson, William C. |
collection | PubMed |
description | BACKGROUND: Advances in sequencing, assembly, and assortment of contigs into species-specific bins has enabled the reconstruction of genomes from metagenomic data (MAGs). Though a powerful technique, it is difficult to determine whether assembly and binning techniques are accurate when applied to environmental metagenomes due to a lack of complete reference genome sequences against which to check the resulting MAGs. METHODS: We compared MAGs derived from an enrichment culture containing ~20 organisms to complete genome sequences of 10 organisms isolated from the enrichment culture. Factors commonly considered in binning software—nucleotide composition and sequence repetitiveness—were calculated for both the correctly binned and not-binned regions. This direct comparison revealed biases in sequence characteristics and gene content in the not-binned regions. Additionally, the composition of three public data sets representing MAGs reconstructed from the Tara Oceans metagenomic data was compared to a set of representative genomes available through NCBI RefSeq to verify that the biases identified were observable in more complex data sets and using three contemporary binning software packages. RESULTS: Repeat sequences were frequently not binned in the genome reconstruction processes, as were sequence regions with variant nucleotide composition. Genes encoded on the not-binned regions were strongly biased towards ribosomal RNAs, transfer RNAs, mobile element functions and genes of unknown function. Our results support genome reconstruction as a robust process and suggest that reconstructions determined to be >90% complete are likely to effectively represent organismal function; however, population-level genotypic heterogeneity in natural populations, such as uneven distribution of plasmids, can lead to incorrect inferences. |
format | Online Article Text |
id | pubmed-7605220 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-76052202020-11-12 Biases in genome reconstruction from metagenomic data Nelson, William C. Tully, Benjamin J. Mobberley, Jennifer M. PeerJ Bioinformatics BACKGROUND: Advances in sequencing, assembly, and assortment of contigs into species-specific bins has enabled the reconstruction of genomes from metagenomic data (MAGs). Though a powerful technique, it is difficult to determine whether assembly and binning techniques are accurate when applied to environmental metagenomes due to a lack of complete reference genome sequences against which to check the resulting MAGs. METHODS: We compared MAGs derived from an enrichment culture containing ~20 organisms to complete genome sequences of 10 organisms isolated from the enrichment culture. Factors commonly considered in binning software—nucleotide composition and sequence repetitiveness—were calculated for both the correctly binned and not-binned regions. This direct comparison revealed biases in sequence characteristics and gene content in the not-binned regions. Additionally, the composition of three public data sets representing MAGs reconstructed from the Tara Oceans metagenomic data was compared to a set of representative genomes available through NCBI RefSeq to verify that the biases identified were observable in more complex data sets and using three contemporary binning software packages. RESULTS: Repeat sequences were frequently not binned in the genome reconstruction processes, as were sequence regions with variant nucleotide composition. Genes encoded on the not-binned regions were strongly biased towards ribosomal RNAs, transfer RNAs, mobile element functions and genes of unknown function. Our results support genome reconstruction as a robust process and suggest that reconstructions determined to be >90% complete are likely to effectively represent organismal function; however, population-level genotypic heterogeneity in natural populations, such as uneven distribution of plasmids, can lead to incorrect inferences. PeerJ Inc. 2020-10-30 /pmc/articles/PMC7605220/ /pubmed/33194386 http://dx.doi.org/10.7717/peerj.10119 Text en © 2020 Nelson et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Nelson, William C. Tully, Benjamin J. Mobberley, Jennifer M. Biases in genome reconstruction from metagenomic data |
title | Biases in genome reconstruction from metagenomic data |
title_full | Biases in genome reconstruction from metagenomic data |
title_fullStr | Biases in genome reconstruction from metagenomic data |
title_full_unstemmed | Biases in genome reconstruction from metagenomic data |
title_short | Biases in genome reconstruction from metagenomic data |
title_sort | biases in genome reconstruction from metagenomic data |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7605220/ https://www.ncbi.nlm.nih.gov/pubmed/33194386 http://dx.doi.org/10.7717/peerj.10119 |
work_keys_str_mv | AT nelsonwilliamc biasesingenomereconstructionfrommetagenomicdata AT tullybenjaminj biasesingenomereconstructionfrommetagenomicdata AT mobberleyjenniferm biasesingenomereconstructionfrommetagenomicdata |