Cargando…

A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling

BACKGROUND: Biochemical and regulatory pathways have until recently been thought and modelled within one cell type, one organism and one species. This vision is being dramatically changed by the advent of whole microbiome sequencing studies, revealing the role of symbiotic microbial populations in f...

Descripción completa

Detalles Bibliográficos
Autores principales: Ugarte, Ari, Vicedomini, Riccardo, Bernardes, Juliana, Carbone, Alessandra
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6114274/
https://www.ncbi.nlm.nih.gov/pubmed/30153857
http://dx.doi.org/10.1186/s40168-018-0532-2
_version_ 1783351161816875008
author Ugarte, Ari
Vicedomini, Riccardo
Bernardes, Juliana
Carbone, Alessandra
author_facet Ugarte, Ari
Vicedomini, Riccardo
Bernardes, Juliana
Carbone, Alessandra
author_sort Ugarte, Ari
collection PubMed
description BACKGROUND: Biochemical and regulatory pathways have until recently been thought and modelled within one cell type, one organism and one species. This vision is being dramatically changed by the advent of whole microbiome sequencing studies, revealing the role of symbiotic microbial populations in fundamental biochemical functions. The new landscape we face requires the reconstruction of biochemical and regulatory pathways at the community level in a given environment. In order to understand how environmental factors affect the genetic material and the dynamics of the expression from one environment to another, we want to evaluate the quantity of gene protein sequences or transcripts associated to a given pathway by precisely estimating the abundance of protein domains, their weak presence or absence in environmental samples. RESULTS: MetaCLADE is a novel profile-based domain annotation pipeline based on a multi-source domain annotation strategy. It applies directly to reads and improves identification of the catalog of functions in microbiomes. MetaCLADE is applied to simulated data and to more than ten metagenomic and metatranscriptomic datasets from different environments where it outperforms InterProScan in the number of annotated domains. It is compared to the state-of-the-art non-profile-based and profile-based methods, UProC and HMM-GRASPx, showing complementary predictions to UProC. A combination of MetaCLADE and UProC improves even further the functional annotation of environmental samples. CONCLUSIONS: Learning about the functional activity of environmental microbial communities is a crucial step to understand microbial interactions and large-scale environmental impact. MetaCLADE has been explicitly designed for metagenomic and metatranscriptomic data and allows for the discovery of patterns in divergent sequences, thanks to its multi-source strategy. MetaCLADE highly improves current domain annotation methods and reaches a fine degree of accuracy in annotation of very different environments such as soil and marine ecosystems, ancient metagenomes and human tissues. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0532-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6114274
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61142742018-09-04 A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling Ugarte, Ari Vicedomini, Riccardo Bernardes, Juliana Carbone, Alessandra Microbiome Research BACKGROUND: Biochemical and regulatory pathways have until recently been thought and modelled within one cell type, one organism and one species. This vision is being dramatically changed by the advent of whole microbiome sequencing studies, revealing the role of symbiotic microbial populations in fundamental biochemical functions. The new landscape we face requires the reconstruction of biochemical and regulatory pathways at the community level in a given environment. In order to understand how environmental factors affect the genetic material and the dynamics of the expression from one environment to another, we want to evaluate the quantity of gene protein sequences or transcripts associated to a given pathway by precisely estimating the abundance of protein domains, their weak presence or absence in environmental samples. RESULTS: MetaCLADE is a novel profile-based domain annotation pipeline based on a multi-source domain annotation strategy. It applies directly to reads and improves identification of the catalog of functions in microbiomes. MetaCLADE is applied to simulated data and to more than ten metagenomic and metatranscriptomic datasets from different environments where it outperforms InterProScan in the number of annotated domains. It is compared to the state-of-the-art non-profile-based and profile-based methods, UProC and HMM-GRASPx, showing complementary predictions to UProC. A combination of MetaCLADE and UProC improves even further the functional annotation of environmental samples. CONCLUSIONS: Learning about the functional activity of environmental microbial communities is a crucial step to understand microbial interactions and large-scale environmental impact. MetaCLADE has been explicitly designed for metagenomic and metatranscriptomic data and allows for the discovery of patterns in divergent sequences, thanks to its multi-source strategy. MetaCLADE highly improves current domain annotation methods and reaches a fine degree of accuracy in annotation of very different environments such as soil and marine ecosystems, ancient metagenomes and human tissues. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0532-2) contains supplementary material, which is available to authorized users. BioMed Central 2018-08-28 /pmc/articles/PMC6114274/ /pubmed/30153857 http://dx.doi.org/10.1186/s40168-018-0532-2 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Ugarte, Ari
Vicedomini, Riccardo
Bernardes, Juliana
Carbone, Alessandra
A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling
title A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling
title_full A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling
title_fullStr A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling
title_full_unstemmed A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling
title_short A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling
title_sort multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6114274/
https://www.ncbi.nlm.nih.gov/pubmed/30153857
http://dx.doi.org/10.1186/s40168-018-0532-2
work_keys_str_mv AT ugarteari amultisourcedomainannotationpipelineforquantitativemetagenomicandmetatranscriptomicfunctionalprofiling
AT vicedominiriccardo amultisourcedomainannotationpipelineforquantitativemetagenomicandmetatranscriptomicfunctionalprofiling
AT bernardesjuliana amultisourcedomainannotationpipelineforquantitativemetagenomicandmetatranscriptomicfunctionalprofiling
AT carbonealessandra amultisourcedomainannotationpipelineforquantitativemetagenomicandmetatranscriptomicfunctionalprofiling
AT ugarteari multisourcedomainannotationpipelineforquantitativemetagenomicandmetatranscriptomicfunctionalprofiling
AT vicedominiriccardo multisourcedomainannotationpipelineforquantitativemetagenomicandmetatranscriptomicfunctionalprofiling
AT bernardesjuliana multisourcedomainannotationpipelineforquantitativemetagenomicandmetatranscriptomicfunctionalprofiling
AT carbonealessandra multisourcedomainannotationpipelineforquantitativemetagenomicandmetatranscriptomicfunctionalprofiling