Cargando…

A modular metagenomics analysis system for integrated multi-step data exploration

MOTIVATION: Computational analysis of large-scale metagenomics sequencing datasets has proved to be both incredibly valuable for extracting isolate-level taxonomic and functional insights from complex microbial communities. However, thanks to an ever-expanding ecosystem of metagenomics-specific algo...

Descripción completa

Detalles Bibliográficos
Autores principales: Mak, Lauren, Tierney, Braden, Ronkowski, Cynthia, Toomey, Michael, Martinez, Juan Sebastian Andrade, Zimmerman, Sam, Fu, Chenlian, Kopbayeva, Malika, Noyvert, Anna, Farthing, Brett, Tang, Shuiquan, Mason, Christopher, Hajirasouliha, Iman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10104186/
https://www.ncbi.nlm.nih.gov/pubmed/37066359
http://dx.doi.org/10.1101/2023.04.09.536171
Descripción
Sumario:MOTIVATION: Computational analysis of large-scale metagenomics sequencing datasets has proved to be both incredibly valuable for extracting isolate-level taxonomic and functional insights from complex microbial communities. However, thanks to an ever-expanding ecosystem of metagenomics-specific algorithms and file formats, designing studies, implementing seamless and scalable end-to-end workflows, and exploring the massive amounts of output data have become studies unto themselves. Furthermore, there is little inter-communication between output data of different analytic purposes, such as short-read classification and metagenome assembled genomes (MAG) reconstruction. One-click pipelines have helped to organize these tools into targeted workflows, but they suffer from general compatibility and maintainability issues. RESULTS: To address the gap in easily extensible yet robustly distributable metagenomics workflows, we have developed a module-based metagenomics analysis system written in Snakemake, a popular workflow management system, along with a standardized module and working directory architecture. Each module can be run independently or conjointly with a series of others to produce the target data format (ex. short-read preprocessing alone, or short-read preprocessing followed by de novo assembly), and outputs aggregated summary statistics reports and semi-guided Jupyter notebook-based visualizations, The module system is a bioinformatics-optimzied scaffold designed to be rapidly iterated upon by the research community at large. AVAILABILITY: The module template as well as the modules described below can be found at https://github.com/MetaSUB-CAMP.