Cargando…

DOE JGI Metagenome Workflow

The DOE Joint Genome Institute (JGI) Metagenome Workflow performs metagenome data processing, including assembly; structural, functional, and taxonomic annotation; and binning of metagenomic data sets that are subsequently included into the Integrated Microbial Genomes and Microbiomes (IMG/M) (I.-M....

Descripción completa

Detalles Bibliográficos
Autores principales: Clum, Alicia, Huntemann, Marcel, Bushnell, Brian, Foster, Brian, Foster, Bryce, Roux, Simon, Hajek, Patrick P., Varghese, Neha, Mukherjee, Supratim, Reddy, T. B. K., Daum, Chris, Yoshinaga, Yuko, O’Malley, Ronan, Seshadri, Rekha, Kyrpides, Nikos C., Eloe-Fadrosh, Emiley A., Chen, I-Min A., Copeland, Alex, Ivanova, Natalia N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8269246/
https://www.ncbi.nlm.nih.gov/pubmed/34006627
http://dx.doi.org/10.1128/mSystems.00804-20
_version_ 1783720535920738304
author Clum, Alicia
Huntemann, Marcel
Bushnell, Brian
Foster, Brian
Foster, Bryce
Roux, Simon
Hajek, Patrick P.
Varghese, Neha
Mukherjee, Supratim
Reddy, T. B. K.
Daum, Chris
Yoshinaga, Yuko
O’Malley, Ronan
Seshadri, Rekha
Kyrpides, Nikos C.
Eloe-Fadrosh, Emiley A.
Chen, I-Min A.
Copeland, Alex
Ivanova, Natalia N.
author_facet Clum, Alicia
Huntemann, Marcel
Bushnell, Brian
Foster, Brian
Foster, Bryce
Roux, Simon
Hajek, Patrick P.
Varghese, Neha
Mukherjee, Supratim
Reddy, T. B. K.
Daum, Chris
Yoshinaga, Yuko
O’Malley, Ronan
Seshadri, Rekha
Kyrpides, Nikos C.
Eloe-Fadrosh, Emiley A.
Chen, I-Min A.
Copeland, Alex
Ivanova, Natalia N.
author_sort Clum, Alicia
collection PubMed
description The DOE Joint Genome Institute (JGI) Metagenome Workflow performs metagenome data processing, including assembly; structural, functional, and taxonomic annotation; and binning of metagenomic data sets that are subsequently included into the Integrated Microbial Genomes and Microbiomes (IMG/M) (I.-M. A. Chen, K. Chu, K. Palaniappan, A. Ratner, et al., Nucleic Acids Res, 49:D751–D763, 2021, https://doi.org/10.1093/nar/gkaa939) comparative analysis system and provided for download via the JGI data portal (https://genome.jgi.doe.gov/portal/). This workflow scales to run on thousands of metagenome samples per year, which can vary by the complexity of microbial communities and sequencing depth. Here, we describe the different tools, databases, and parameters used at different steps of the workflow to help with the interpretation of metagenome data available in IMG and to enable researchers to apply this workflow to their own data. We use 20 publicly available sediment metagenomes to illustrate the computing requirements for the different steps and highlight the typical results of data processing. The workflow modules for read filtering and metagenome assembly are available as a workflow description language (WDL) file (https://code.jgi.doe.gov/BFoster/jgi_meta_wdl). The workflow modules for annotation and binning are provided as a service to the user community at https://img.jgi.doe.gov/submit and require filling out the project and associated metadata descriptions in the Genomes OnLine Database (GOLD) (S. Mukherjee, D. Stamatis, J. Bertsch, G. Ovchinnikova, et al., Nucleic Acids Res, 49:D723–D733, 2021, https://doi.org/10.1093/nar/gkaa983). IMPORTANCE The DOE JGI Metagenome Workflow is designed for processing metagenomic data sets starting from Illumina fastq files. It performs data preprocessing, error correction, assembly, structural and functional annotation, and binning. The results of processing are provided in several standard formats, such as fasta and gff, and can be used for subsequent integration into the Integrated Microbial Genomes and Microbiomes (IMG/M) system where they can be compared to a comprehensive set of publicly available metagenomes. As of 30 July 2020, 7,155 JGI metagenomes have been processed by the DOE JGI Metagenome Workflow. Here, we present a metagenome workflow developed at the JGI that generates rich data in standard formats and has been optimized for downstream analyses ranging from assessment of the functional and taxonomic composition of microbial communities to genome-resolved metagenomics and the identification and characterization of novel taxa. This workflow is currently being used to analyze thousands of metagenomic data sets in a consistent and standardized manner.
format Online
Article
Text
id pubmed-8269246
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-82692462021-08-02 DOE JGI Metagenome Workflow Clum, Alicia Huntemann, Marcel Bushnell, Brian Foster, Brian Foster, Bryce Roux, Simon Hajek, Patrick P. Varghese, Neha Mukherjee, Supratim Reddy, T. B. K. Daum, Chris Yoshinaga, Yuko O’Malley, Ronan Seshadri, Rekha Kyrpides, Nikos C. Eloe-Fadrosh, Emiley A. Chen, I-Min A. Copeland, Alex Ivanova, Natalia N. mSystems Methods and Protocols The DOE Joint Genome Institute (JGI) Metagenome Workflow performs metagenome data processing, including assembly; structural, functional, and taxonomic annotation; and binning of metagenomic data sets that are subsequently included into the Integrated Microbial Genomes and Microbiomes (IMG/M) (I.-M. A. Chen, K. Chu, K. Palaniappan, A. Ratner, et al., Nucleic Acids Res, 49:D751–D763, 2021, https://doi.org/10.1093/nar/gkaa939) comparative analysis system and provided for download via the JGI data portal (https://genome.jgi.doe.gov/portal/). This workflow scales to run on thousands of metagenome samples per year, which can vary by the complexity of microbial communities and sequencing depth. Here, we describe the different tools, databases, and parameters used at different steps of the workflow to help with the interpretation of metagenome data available in IMG and to enable researchers to apply this workflow to their own data. We use 20 publicly available sediment metagenomes to illustrate the computing requirements for the different steps and highlight the typical results of data processing. The workflow modules for read filtering and metagenome assembly are available as a workflow description language (WDL) file (https://code.jgi.doe.gov/BFoster/jgi_meta_wdl). The workflow modules for annotation and binning are provided as a service to the user community at https://img.jgi.doe.gov/submit and require filling out the project and associated metadata descriptions in the Genomes OnLine Database (GOLD) (S. Mukherjee, D. Stamatis, J. Bertsch, G. Ovchinnikova, et al., Nucleic Acids Res, 49:D723–D733, 2021, https://doi.org/10.1093/nar/gkaa983). IMPORTANCE The DOE JGI Metagenome Workflow is designed for processing metagenomic data sets starting from Illumina fastq files. It performs data preprocessing, error correction, assembly, structural and functional annotation, and binning. The results of processing are provided in several standard formats, such as fasta and gff, and can be used for subsequent integration into the Integrated Microbial Genomes and Microbiomes (IMG/M) system where they can be compared to a comprehensive set of publicly available metagenomes. As of 30 July 2020, 7,155 JGI metagenomes have been processed by the DOE JGI Metagenome Workflow. Here, we present a metagenome workflow developed at the JGI that generates rich data in standard formats and has been optimized for downstream analyses ranging from assessment of the functional and taxonomic composition of microbial communities to genome-resolved metagenomics and the identification and characterization of novel taxa. This workflow is currently being used to analyze thousands of metagenomic data sets in a consistent and standardized manner. American Society for Microbiology 2021-05-18 /pmc/articles/PMC8269246/ /pubmed/34006627 http://dx.doi.org/10.1128/mSystems.00804-20 Text en https://doi.org/10.1128/AuthorWarrantyLicense.v1This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
spellingShingle Methods and Protocols
Clum, Alicia
Huntemann, Marcel
Bushnell, Brian
Foster, Brian
Foster, Bryce
Roux, Simon
Hajek, Patrick P.
Varghese, Neha
Mukherjee, Supratim
Reddy, T. B. K.
Daum, Chris
Yoshinaga, Yuko
O’Malley, Ronan
Seshadri, Rekha
Kyrpides, Nikos C.
Eloe-Fadrosh, Emiley A.
Chen, I-Min A.
Copeland, Alex
Ivanova, Natalia N.
DOE JGI Metagenome Workflow
title DOE JGI Metagenome Workflow
title_full DOE JGI Metagenome Workflow
title_fullStr DOE JGI Metagenome Workflow
title_full_unstemmed DOE JGI Metagenome Workflow
title_short DOE JGI Metagenome Workflow
title_sort doe jgi metagenome workflow
topic Methods and Protocols
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8269246/
https://www.ncbi.nlm.nih.gov/pubmed/34006627
http://dx.doi.org/10.1128/mSystems.00804-20
work_keys_str_mv AT clumalicia doejgimetagenomeworkflow
AT huntemannmarcel doejgimetagenomeworkflow
AT bushnellbrian doejgimetagenomeworkflow
AT fosterbrian doejgimetagenomeworkflow
AT fosterbryce doejgimetagenomeworkflow
AT rouxsimon doejgimetagenomeworkflow
AT hajekpatrickp doejgimetagenomeworkflow
AT vargheseneha doejgimetagenomeworkflow
AT mukherjeesupratim doejgimetagenomeworkflow
AT reddytbk doejgimetagenomeworkflow
AT daumchris doejgimetagenomeworkflow
AT yoshinagayuko doejgimetagenomeworkflow
AT omalleyronan doejgimetagenomeworkflow
AT seshadrirekha doejgimetagenomeworkflow
AT kyrpidesnikosc doejgimetagenomeworkflow
AT eloefadroshemileya doejgimetagenomeworkflow
AT chenimina doejgimetagenomeworkflow
AT copelandalex doejgimetagenomeworkflow
AT ivanovanatalian doejgimetagenomeworkflow