Cargando…

Accurate and complete genomes from metagenomes

Genomes are an integral component of the biological information about an organism; thus, the more complete the genome, the more informative it is. Historically, bacterial and archaeal genomes were reconstructed from pure (monoclonal) cultures, and the first reported sequences were manually curated t...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Lin-Xing, Anantharaman, Karthik, Shaiber, Alon, Eren, A. Murat, Banfield, Jillian F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7111523/
https://www.ncbi.nlm.nih.gov/pubmed/32188701
http://dx.doi.org/10.1101/gr.258640.119
_version_ 1783513305681231872
author Chen, Lin-Xing
Anantharaman, Karthik
Shaiber, Alon
Eren, A. Murat
Banfield, Jillian F.
author_facet Chen, Lin-Xing
Anantharaman, Karthik
Shaiber, Alon
Eren, A. Murat
Banfield, Jillian F.
author_sort Chen, Lin-Xing
collection PubMed
description Genomes are an integral component of the biological information about an organism; thus, the more complete the genome, the more informative it is. Historically, bacterial and archaeal genomes were reconstructed from pure (monoclonal) cultures, and the first reported sequences were manually curated to completion. However, the bottleneck imposed by the requirement for isolates precluded genomic insights for the vast majority of microbial life. Shotgun sequencing of microbial communities, referred to initially as community genomics and subsequently as genome-resolved metagenomics, can circumvent this limitation by obtaining metagenome-assembled genomes (MAGs); but gaps, local assembly errors, chimeras, and contamination by fragments from other genomes limit the value of these genomes. Here, we discuss genome curation to improve and, in some cases, achieve complete (circularized, no gaps) MAGs (CMAGs). To date, few CMAGs have been generated, although notably some are from very complex systems such as soil and sediment. Through analysis of about 7000 published complete bacterial isolate genomes, we verify the value of cumulative GC skew in combination with other metrics to establish bacterial genome sequence accuracy. The analysis of cumulative GC skew identified potential misassemblies in some reference genomes of isolated bacteria and the repeat sequences that likely gave rise to them. We discuss methods that could be implemented in bioinformatic approaches for curation to ensure that metabolic and evolutionary analyses can be based on very high-quality genomes.
format Online
Article
Text
id pubmed-7111523
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-71115232020-04-03 Accurate and complete genomes from metagenomes Chen, Lin-Xing Anantharaman, Karthik Shaiber, Alon Eren, A. Murat Banfield, Jillian F. Genome Res Review Genomes are an integral component of the biological information about an organism; thus, the more complete the genome, the more informative it is. Historically, bacterial and archaeal genomes were reconstructed from pure (monoclonal) cultures, and the first reported sequences were manually curated to completion. However, the bottleneck imposed by the requirement for isolates precluded genomic insights for the vast majority of microbial life. Shotgun sequencing of microbial communities, referred to initially as community genomics and subsequently as genome-resolved metagenomics, can circumvent this limitation by obtaining metagenome-assembled genomes (MAGs); but gaps, local assembly errors, chimeras, and contamination by fragments from other genomes limit the value of these genomes. Here, we discuss genome curation to improve and, in some cases, achieve complete (circularized, no gaps) MAGs (CMAGs). To date, few CMAGs have been generated, although notably some are from very complex systems such as soil and sediment. Through analysis of about 7000 published complete bacterial isolate genomes, we verify the value of cumulative GC skew in combination with other metrics to establish bacterial genome sequence accuracy. The analysis of cumulative GC skew identified potential misassemblies in some reference genomes of isolated bacteria and the repeat sequences that likely gave rise to them. We discuss methods that could be implemented in bioinformatic approaches for curation to ensure that metabolic and evolutionary analyses can be based on very high-quality genomes. Cold Spring Harbor Laboratory Press 2020-03 /pmc/articles/PMC7111523/ /pubmed/32188701 http://dx.doi.org/10.1101/gr.258640.119 Text en © 2020 Chen et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by/4.0/ This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.
spellingShingle Review
Chen, Lin-Xing
Anantharaman, Karthik
Shaiber, Alon
Eren, A. Murat
Banfield, Jillian F.
Accurate and complete genomes from metagenomes
title Accurate and complete genomes from metagenomes
title_full Accurate and complete genomes from metagenomes
title_fullStr Accurate and complete genomes from metagenomes
title_full_unstemmed Accurate and complete genomes from metagenomes
title_short Accurate and complete genomes from metagenomes
title_sort accurate and complete genomes from metagenomes
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7111523/
https://www.ncbi.nlm.nih.gov/pubmed/32188701
http://dx.doi.org/10.1101/gr.258640.119
work_keys_str_mv AT chenlinxing accurateandcompletegenomesfrommetagenomes
AT anantharamankarthik accurateandcompletegenomesfrommetagenomes
AT shaiberalon accurateandcompletegenomesfrommetagenomes
AT erenamurat accurateandcompletegenomesfrommetagenomes
AT banfieldjillianf accurateandcompletegenomesfrommetagenomes