Cargando…

A method for achieving complete microbial genomes and improving bins from metagenomics data

Metagenomics facilitates the study of the genetic information from uncultured microbes and complex microbial communities. Assembling complete genomes from metagenomics data is difficult because most samples have high organismal complexity and strain diversity. Some studies have attempted to extract...

Descripción completa

Detalles Bibliográficos
Autores principales: Lui, Lauren M., Nielsen, Torben N., Arkin, Adam P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8172020/
https://www.ncbi.nlm.nih.gov/pubmed/33961626
http://dx.doi.org/10.1371/journal.pcbi.1008972
_version_ 1783702461810212864
author Lui, Lauren M.
Nielsen, Torben N.
Arkin, Adam P.
author_facet Lui, Lauren M.
Nielsen, Torben N.
Arkin, Adam P.
author_sort Lui, Lauren M.
collection PubMed
description Metagenomics facilitates the study of the genetic information from uncultured microbes and complex microbial communities. Assembling complete genomes from metagenomics data is difficult because most samples have high organismal complexity and strain diversity. Some studies have attempted to extract complete bacterial, archaeal, and viral genomes and often focus on species with circular genomes so they can help confirm completeness with circularity. However, less than 100 circularized bacterial and archaeal genomes have been assembled and published from metagenomics data despite the thousands of datasets that are available. Circularized genomes are important for (1) building a reference collection as scaffolds for future assemblies, (2) providing complete gene content of a genome, (3) confirming little or no contamination of a genome, (4) studying the genomic context and synteny of genes, and (5) linking protein coding genes to ribosomal RNA genes to aid metabolic inference in 16S rRNA gene sequencing studies. We developed a semi-automated method called Jorg to help circularize small bacterial, archaeal, and viral genomes using iterative assembly, binning, and read mapping. In addition, this method exposes potential misassemblies from k-mer based assemblies. We chose species of the Candidate Phyla Radiation (CPR) to focus our initial efforts because they have small genomes and are only known to have one ribosomal RNA operon. In addition to 34 circular CPR genomes, we present one circular Margulisbacteria genome, one circular Chloroflexi genome, and two circular megaphage genomes from 19 public and published datasets. We demonstrate findings that would likely be difficult without circularizing genomes, including that ribosomal genes are likely not operonic in the majority of CPR, and that some CPR harbor diverged forms of RNase P RNA. Code and a tutorial for this method is available at https://github.com/lmlui/Jorg and is available on the DOE Systems Biology KnowledgeBase as a beta app.
format Online
Article
Text
id pubmed-8172020
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-81720202021-06-14 A method for achieving complete microbial genomes and improving bins from metagenomics data Lui, Lauren M. Nielsen, Torben N. Arkin, Adam P. PLoS Comput Biol Research Article Metagenomics facilitates the study of the genetic information from uncultured microbes and complex microbial communities. Assembling complete genomes from metagenomics data is difficult because most samples have high organismal complexity and strain diversity. Some studies have attempted to extract complete bacterial, archaeal, and viral genomes and often focus on species with circular genomes so they can help confirm completeness with circularity. However, less than 100 circularized bacterial and archaeal genomes have been assembled and published from metagenomics data despite the thousands of datasets that are available. Circularized genomes are important for (1) building a reference collection as scaffolds for future assemblies, (2) providing complete gene content of a genome, (3) confirming little or no contamination of a genome, (4) studying the genomic context and synteny of genes, and (5) linking protein coding genes to ribosomal RNA genes to aid metabolic inference in 16S rRNA gene sequencing studies. We developed a semi-automated method called Jorg to help circularize small bacterial, archaeal, and viral genomes using iterative assembly, binning, and read mapping. In addition, this method exposes potential misassemblies from k-mer based assemblies. We chose species of the Candidate Phyla Radiation (CPR) to focus our initial efforts because they have small genomes and are only known to have one ribosomal RNA operon. In addition to 34 circular CPR genomes, we present one circular Margulisbacteria genome, one circular Chloroflexi genome, and two circular megaphage genomes from 19 public and published datasets. We demonstrate findings that would likely be difficult without circularizing genomes, including that ribosomal genes are likely not operonic in the majority of CPR, and that some CPR harbor diverged forms of RNase P RNA. Code and a tutorial for this method is available at https://github.com/lmlui/Jorg and is available on the DOE Systems Biology KnowledgeBase as a beta app. Public Library of Science 2021-05-07 /pmc/articles/PMC8172020/ /pubmed/33961626 http://dx.doi.org/10.1371/journal.pcbi.1008972 Text en © 2021 Lui et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Lui, Lauren M.
Nielsen, Torben N.
Arkin, Adam P.
A method for achieving complete microbial genomes and improving bins from metagenomics data
title A method for achieving complete microbial genomes and improving bins from metagenomics data
title_full A method for achieving complete microbial genomes and improving bins from metagenomics data
title_fullStr A method for achieving complete microbial genomes and improving bins from metagenomics data
title_full_unstemmed A method for achieving complete microbial genomes and improving bins from metagenomics data
title_short A method for achieving complete microbial genomes and improving bins from metagenomics data
title_sort method for achieving complete microbial genomes and improving bins from metagenomics data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8172020/
https://www.ncbi.nlm.nih.gov/pubmed/33961626
http://dx.doi.org/10.1371/journal.pcbi.1008972
work_keys_str_mv AT luilaurenm amethodforachievingcompletemicrobialgenomesandimprovingbinsfrommetagenomicsdata
AT nielsentorbenn amethodforachievingcompletemicrobialgenomesandimprovingbinsfrommetagenomicsdata
AT arkinadamp amethodforachievingcompletemicrobialgenomesandimprovingbinsfrommetagenomicsdata
AT luilaurenm methodforachievingcompletemicrobialgenomesandimprovingbinsfrommetagenomicsdata
AT nielsentorbenn methodforachievingcompletemicrobialgenomesandimprovingbinsfrommetagenomicsdata
AT arkinadamp methodforachievingcompletemicrobialgenomesandimprovingbinsfrommetagenomicsdata