Cargando…

Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes

Shotgun metagenomics has greatly advanced our understanding of microbial communities over the last decade. Metagenomic analyses often include assembly and genome binning, computationally daunting tasks especially for big data from complex environments such as soil and sediments. In many studies, how...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Jiarong, Quensen, John F., Sun, Yanni, Wang, Qiong, Brown, C. Titus, Cole, James R., Tiedje, James M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6843070/
https://www.ncbi.nlm.nih.gov/pubmed/31749830
http://dx.doi.org/10.3389/fgene.2019.00957
_version_ 1783468132485038080
author Guo, Jiarong
Quensen, John F.
Sun, Yanni
Wang, Qiong
Brown, C. Titus
Cole, James R.
Tiedje, James M.
author_facet Guo, Jiarong
Quensen, John F.
Sun, Yanni
Wang, Qiong
Brown, C. Titus
Cole, James R.
Tiedje, James M.
author_sort Guo, Jiarong
collection PubMed
description Shotgun metagenomics has greatly advanced our understanding of microbial communities over the last decade. Metagenomic analyses often include assembly and genome binning, computationally daunting tasks especially for big data from complex environments such as soil and sediments. In many studies, however, only a subset of genes and pathways involved in specific functions are of interest; thus, it is not necessary to attempt global assembly. In addition, methods that target genes can be computationally more efficient and produce more accurate assembly by leveraging rich databases, especially for those genes that are of broad interest such as those involved in biogeochemical cycles, biodegradation, and antibiotic resistance or used as phylogenetic markers. Here, we review six gene-targeted assemblers with unique algorithms for extracting and/or assembling targeted genes: Xander, MegaGTA, SAT-Assembler, HMM-GRASPx, GenSeed-HMM, and MEGAN. We tested these tools using two datasets with known genomes, a synthetic community of artificial reads derived from the genomes of 17 bacteria, shotgun sequence data from a mock community with 48 bacteria and 16 archaea genomes, and a large soil shotgun metagenomic dataset. We compared assemblies of a universal single copy gene (rplB) and two N cycle genes (nifH and nirK). We measured their computational efficiency, sensitivity, specificity, and chimera rate and found Xander and MegaGTA, which both use a probabilistic graph structure to model the genes, have the best overall performance with all three datasets, although MEGAN, a reference matching assembler, had better sensitivity with synthetic and mock community members chosen from its reference collection. Also, Xander and MegaGTA are the only tools that include post-assembly scripts tuned for common molecular ecology and diversity analyses. Additionally, we provide a mathematical model for estimating the probability of assembling targeted genes in a metagenome for estimating required sequencing depth.
format Online
Article
Text
id pubmed-6843070
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-68430702019-11-20 Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes Guo, Jiarong Quensen, John F. Sun, Yanni Wang, Qiong Brown, C. Titus Cole, James R. Tiedje, James M. Front Genet Genetics Shotgun metagenomics has greatly advanced our understanding of microbial communities over the last decade. Metagenomic analyses often include assembly and genome binning, computationally daunting tasks especially for big data from complex environments such as soil and sediments. In many studies, however, only a subset of genes and pathways involved in specific functions are of interest; thus, it is not necessary to attempt global assembly. In addition, methods that target genes can be computationally more efficient and produce more accurate assembly by leveraging rich databases, especially for those genes that are of broad interest such as those involved in biogeochemical cycles, biodegradation, and antibiotic resistance or used as phylogenetic markers. Here, we review six gene-targeted assemblers with unique algorithms for extracting and/or assembling targeted genes: Xander, MegaGTA, SAT-Assembler, HMM-GRASPx, GenSeed-HMM, and MEGAN. We tested these tools using two datasets with known genomes, a synthetic community of artificial reads derived from the genomes of 17 bacteria, shotgun sequence data from a mock community with 48 bacteria and 16 archaea genomes, and a large soil shotgun metagenomic dataset. We compared assemblies of a universal single copy gene (rplB) and two N cycle genes (nifH and nirK). We measured their computational efficiency, sensitivity, specificity, and chimera rate and found Xander and MegaGTA, which both use a probabilistic graph structure to model the genes, have the best overall performance with all three datasets, although MEGAN, a reference matching assembler, had better sensitivity with synthetic and mock community members chosen from its reference collection. Also, Xander and MegaGTA are the only tools that include post-assembly scripts tuned for common molecular ecology and diversity analyses. Additionally, we provide a mathematical model for estimating the probability of assembling targeted genes in a metagenome for estimating required sequencing depth. Frontiers Media S.A. 2019-10-15 /pmc/articles/PMC6843070/ /pubmed/31749830 http://dx.doi.org/10.3389/fgene.2019.00957 Text en Copyright © 2019 Guo, Quensen, Sun, Wang, Brown, Cole and Tiedje http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Guo, Jiarong
Quensen, John F.
Sun, Yanni
Wang, Qiong
Brown, C. Titus
Cole, James R.
Tiedje, James M.
Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes
title Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes
title_full Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes
title_fullStr Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes
title_full_unstemmed Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes
title_short Review, Evaluation, and Directions for Gene-Targeted Assembly for Ecological Analyses of Metagenomes
title_sort review, evaluation, and directions for gene-targeted assembly for ecological analyses of metagenomes
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6843070/
https://www.ncbi.nlm.nih.gov/pubmed/31749830
http://dx.doi.org/10.3389/fgene.2019.00957
work_keys_str_mv AT guojiarong reviewevaluationanddirectionsforgenetargetedassemblyforecologicalanalysesofmetagenomes
AT quensenjohnf reviewevaluationanddirectionsforgenetargetedassemblyforecologicalanalysesofmetagenomes
AT sunyanni reviewevaluationanddirectionsforgenetargetedassemblyforecologicalanalysesofmetagenomes
AT wangqiong reviewevaluationanddirectionsforgenetargetedassemblyforecologicalanalysesofmetagenomes
AT brownctitus reviewevaluationanddirectionsforgenetargetedassemblyforecologicalanalysesofmetagenomes
AT colejamesr reviewevaluationanddirectionsforgenetargetedassemblyforecologicalanalysesofmetagenomes
AT tiedjejamesm reviewevaluationanddirectionsforgenetargetedassemblyforecologicalanalysesofmetagenomes