Cargando…

CAMAMED: a pipeline for composition-aware mapping-based analysis of metagenomic data

Metagenomics is the study of genomic DNA recovered from a microbial community. Both assembly-based and mapping-based methods have been used to analyze metagenomic data. When appropriate gene catalogs are available, mapping-based methods are preferred over assembly based approaches, especially for an...

Descripción completa

Detalles Bibliográficos
Autores principales: Norouzi-Beirami, Mohammad H, Marashi, Sayed-Amir, Banaei-Moghaddam, Ali M, Kavousi, Kaveh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7787360/
https://www.ncbi.nlm.nih.gov/pubmed/33575649
http://dx.doi.org/10.1093/nargab/lqaa107
_version_ 1783632808252538880
author Norouzi-Beirami, Mohammad H
Marashi, Sayed-Amir
Banaei-Moghaddam, Ali M
Kavousi, Kaveh
author_facet Norouzi-Beirami, Mohammad H
Marashi, Sayed-Amir
Banaei-Moghaddam, Ali M
Kavousi, Kaveh
author_sort Norouzi-Beirami, Mohammad H
collection PubMed
description Metagenomics is the study of genomic DNA recovered from a microbial community. Both assembly-based and mapping-based methods have been used to analyze metagenomic data. When appropriate gene catalogs are available, mapping-based methods are preferred over assembly based approaches, especially for analyzing the data at the functional level. In this study, we introduce CAMAMED as a composition-aware mapping-based metagenomic data analysis pipeline. This pipeline can analyze metagenomic samples at both taxonomic and functional profiling levels. Using this pipeline, metagenome sequences can be mapped to non-redundant gene catalogs and the gene frequency in the samples are obtained. Due to the highly compositional nature of metagenomic data, the cumulative sum-scaling method is used at both taxa and gene levels for compositional data analysis in our pipeline. Additionally, by mapping the genes to the KEGG database, annotations related to each gene can be extracted at different functional levels such as KEGG ortholog groups, enzyme commission numbers and reactions. Furthermore, the pipeline enables the user to identify potential biomarkers in case-control metagenomic samples by investigating functional differences. The source code for this software is available from https://github.com/mhnb/camamed. Also, the ready to use Docker images are available at https://hub.docker.com.
format Online
Article
Text
id pubmed-7787360
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77873602021-02-10 CAMAMED: a pipeline for composition-aware mapping-based analysis of metagenomic data Norouzi-Beirami, Mohammad H Marashi, Sayed-Amir Banaei-Moghaddam, Ali M Kavousi, Kaveh NAR Genom Bioinform Application Notes Metagenomics is the study of genomic DNA recovered from a microbial community. Both assembly-based and mapping-based methods have been used to analyze metagenomic data. When appropriate gene catalogs are available, mapping-based methods are preferred over assembly based approaches, especially for analyzing the data at the functional level. In this study, we introduce CAMAMED as a composition-aware mapping-based metagenomic data analysis pipeline. This pipeline can analyze metagenomic samples at both taxonomic and functional profiling levels. Using this pipeline, metagenome sequences can be mapped to non-redundant gene catalogs and the gene frequency in the samples are obtained. Due to the highly compositional nature of metagenomic data, the cumulative sum-scaling method is used at both taxa and gene levels for compositional data analysis in our pipeline. Additionally, by mapping the genes to the KEGG database, annotations related to each gene can be extracted at different functional levels such as KEGG ortholog groups, enzyme commission numbers and reactions. Furthermore, the pipeline enables the user to identify potential biomarkers in case-control metagenomic samples by investigating functional differences. The source code for this software is available from https://github.com/mhnb/camamed. Also, the ready to use Docker images are available at https://hub.docker.com. Oxford University Press 2021-01-06 /pmc/articles/PMC7787360/ /pubmed/33575649 http://dx.doi.org/10.1093/nargab/lqaa107 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Application Notes
Norouzi-Beirami, Mohammad H
Marashi, Sayed-Amir
Banaei-Moghaddam, Ali M
Kavousi, Kaveh
CAMAMED: a pipeline for composition-aware mapping-based analysis of metagenomic data
title CAMAMED: a pipeline for composition-aware mapping-based analysis of metagenomic data
title_full CAMAMED: a pipeline for composition-aware mapping-based analysis of metagenomic data
title_fullStr CAMAMED: a pipeline for composition-aware mapping-based analysis of metagenomic data
title_full_unstemmed CAMAMED: a pipeline for composition-aware mapping-based analysis of metagenomic data
title_short CAMAMED: a pipeline for composition-aware mapping-based analysis of metagenomic data
title_sort camamed: a pipeline for composition-aware mapping-based analysis of metagenomic data
topic Application Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7787360/
https://www.ncbi.nlm.nih.gov/pubmed/33575649
http://dx.doi.org/10.1093/nargab/lqaa107
work_keys_str_mv AT norouzibeiramimohammadh camamedapipelineforcompositionawaremappingbasedanalysisofmetagenomicdata
AT marashisayedamir camamedapipelineforcompositionawaremappingbasedanalysisofmetagenomicdata
AT banaeimoghaddamalim camamedapipelineforcompositionawaremappingbasedanalysisofmetagenomicdata
AT kavousikaveh camamedapipelineforcompositionawaremappingbasedanalysisofmetagenomicdata