Cargando…

phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes

The small-subunit rRNA (SSU rRNA) gene is the key marker in molecular ecology for all domains of life, but it is largely absent from metagenome-assembled genomes that often are the only resource available for environmental microbes. Here, we present phyloFlash, a pipeline to overcome this gap with r...

Descripción completa

Detalles Bibliográficos
Autores principales: Gruber-Vodicka, Harald R., Seah, Brandon K. B., Pruesse, Elmar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7593591/
https://www.ncbi.nlm.nih.gov/pubmed/33109753
http://dx.doi.org/10.1128/mSystems.00920-20
_version_ 1783601418077208576
author Gruber-Vodicka, Harald R.
Seah, Brandon K. B.
Pruesse, Elmar
author_facet Gruber-Vodicka, Harald R.
Seah, Brandon K. B.
Pruesse, Elmar
author_sort Gruber-Vodicka, Harald R.
collection PubMed
description The small-subunit rRNA (SSU rRNA) gene is the key marker in molecular ecology for all domains of life, but it is largely absent from metagenome-assembled genomes that often are the only resource available for environmental microbes. Here, we present phyloFlash, a pipeline to overcome this gap with rapid, SSU rRNA-centered taxonomic classification, targeted assembly, and graph-based binning of full metagenomic assemblies. We show that a cleanup of artifacts is pivotal even with a curated reference database. With such a filtered database, the general-purpose mapper BBmap extracts SSU rRNA reads five times faster than the rRNA-specialized tool SortMeRNA with similar sensitivity and higher selectivity on simulated metagenomes. Reference-based targeted assemblers yielded either highly fragmented assemblies or high levels of chimerism, so we employ the general-purpose genomic assembler SPAdes. Our optimized implementation is independent of reference database composition and has satisfactory levels of chimera formation. phyloFlash quickly processes Illumina (meta)genomic data, is straightforward to use, even as part of high-throughput quality control, and has user-friendly output reports. The software is available at https://github.com/HRGV/phyloFlash (GPL3 license) and is documented with an online manual. IMPORTANCE To track organisms across all domains of life, the SSU rRNA gene is the gold standard. Many environmental microbes are known only from high-throughput sequence data, but the SSU rRNA gene, the key to visualization by molecular probes and link to existing literature, is often missing from metagenome-assembled genomes (MAGs). The easy-to-use phyloFlash software suite tackles this gap with rapid, SSU rRNA-centered taxonomic classification, targeted assembly, and graph-based linking to MAGs. Starting from a cleaned reference database, phyloFlash profiles the taxonomic diversity and assembles the sorted SSU rRNA reads. The phyloFlash design is domain agnostic and covers eukaryotes, archaea, and bacteria alike. phyloFlash also provides utilities to visualize multisample comparisons and to integrate the recovered SSU rRNAs in a metagenomics workflow by linking them to MAGs using assembly graph parsing.
format Online
Article
Text
id pubmed-7593591
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-75935912020-11-06 phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes Gruber-Vodicka, Harald R. Seah, Brandon K. B. Pruesse, Elmar mSystems Methods and Protocols The small-subunit rRNA (SSU rRNA) gene is the key marker in molecular ecology for all domains of life, but it is largely absent from metagenome-assembled genomes that often are the only resource available for environmental microbes. Here, we present phyloFlash, a pipeline to overcome this gap with rapid, SSU rRNA-centered taxonomic classification, targeted assembly, and graph-based binning of full metagenomic assemblies. We show that a cleanup of artifacts is pivotal even with a curated reference database. With such a filtered database, the general-purpose mapper BBmap extracts SSU rRNA reads five times faster than the rRNA-specialized tool SortMeRNA with similar sensitivity and higher selectivity on simulated metagenomes. Reference-based targeted assemblers yielded either highly fragmented assemblies or high levels of chimerism, so we employ the general-purpose genomic assembler SPAdes. Our optimized implementation is independent of reference database composition and has satisfactory levels of chimera formation. phyloFlash quickly processes Illumina (meta)genomic data, is straightforward to use, even as part of high-throughput quality control, and has user-friendly output reports. The software is available at https://github.com/HRGV/phyloFlash (GPL3 license) and is documented with an online manual. IMPORTANCE To track organisms across all domains of life, the SSU rRNA gene is the gold standard. Many environmental microbes are known only from high-throughput sequence data, but the SSU rRNA gene, the key to visualization by molecular probes and link to existing literature, is often missing from metagenome-assembled genomes (MAGs). The easy-to-use phyloFlash software suite tackles this gap with rapid, SSU rRNA-centered taxonomic classification, targeted assembly, and graph-based linking to MAGs. Starting from a cleaned reference database, phyloFlash profiles the taxonomic diversity and assembles the sorted SSU rRNA reads. The phyloFlash design is domain agnostic and covers eukaryotes, archaea, and bacteria alike. phyloFlash also provides utilities to visualize multisample comparisons and to integrate the recovered SSU rRNAs in a metagenomics workflow by linking them to MAGs using assembly graph parsing. American Society for Microbiology 2020-10-27 /pmc/articles/PMC7593591/ /pubmed/33109753 http://dx.doi.org/10.1128/mSystems.00920-20 Text en Copyright © 2020 Gruber-Vodicka et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Methods and Protocols
Gruber-Vodicka, Harald R.
Seah, Brandon K. B.
Pruesse, Elmar
phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes
title phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes
title_full phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes
title_fullStr phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes
title_full_unstemmed phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes
title_short phyloFlash: Rapid Small-Subunit rRNA Profiling and Targeted Assembly from Metagenomes
title_sort phyloflash: rapid small-subunit rrna profiling and targeted assembly from metagenomes
topic Methods and Protocols
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7593591/
https://www.ncbi.nlm.nih.gov/pubmed/33109753
http://dx.doi.org/10.1128/mSystems.00920-20
work_keys_str_mv AT grubervodickaharaldr phyloflashrapidsmallsubunitrrnaprofilingandtargetedassemblyfrommetagenomes
AT seahbrandonkb phyloflashrapidsmallsubunitrrnaprofilingandtargetedassemblyfrommetagenomes
AT pruesseelmar phyloflashrapidsmallsubunitrrnaprofilingandtargetedassemblyfrommetagenomes