Cargando…

MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle

The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform d...

Descripción completa

Detalles Bibliográficos
Autores principales: De Anda, Valerie, Zapata-Peñasco, Icoquih, Poot-Hernandez, Augusto Cesar, Eguiarte, Luis E, Contreras-Moreira, Bruno, Souza, Valeria
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737871/
https://www.ncbi.nlm.nih.gov/pubmed/29069412
http://dx.doi.org/10.1093/gigascience/gix096
_version_ 1783287589048942592
author De Anda, Valerie
Zapata-Peñasco, Icoquih
Poot-Hernandez, Augusto Cesar
Eguiarte, Luis E
Contreras-Moreira, Bruno
Souza, Valeria
author_facet De Anda, Valerie
Zapata-Peñasco, Icoquih
Poot-Hernandez, Augusto Cesar
Eguiarte, Luis E
Contreras-Moreira, Bruno
Souza, Valeria
author_sort De Anda, Valerie
collection PubMed
description The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large “omic” datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H(΄)), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa.
format Online
Article
Text
id pubmed-5737871
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-57378712018-01-04 MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle De Anda, Valerie Zapata-Peñasco, Icoquih Poot-Hernandez, Augusto Cesar Eguiarte, Luis E Contreras-Moreira, Bruno Souza, Valeria Gigascience Research The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large “omic” datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H(΄)), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa. Oxford University Press 2017-10-23 /pmc/articles/PMC5737871/ /pubmed/29069412 http://dx.doi.org/10.1093/gigascience/gix096 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
De Anda, Valerie
Zapata-Peñasco, Icoquih
Poot-Hernandez, Augusto Cesar
Eguiarte, Luis E
Contreras-Moreira, Bruno
Souza, Valeria
MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle
title MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle
title_full MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle
title_fullStr MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle
title_full_unstemmed MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle
title_short MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle
title_sort mebs, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737871/
https://www.ncbi.nlm.nih.gov/pubmed/29069412
http://dx.doi.org/10.1093/gigascience/gix096
work_keys_str_mv AT deandavalerie mebsasoftwareplatformtoevaluatelargemetagenomiccollectionsaccordingtotheirmetabolicmachineryunravelingthesulfurcycle
AT zapatapenascoicoquih mebsasoftwareplatformtoevaluatelargemetagenomiccollectionsaccordingtotheirmetabolicmachineryunravelingthesulfurcycle
AT poothernandezaugustocesar mebsasoftwareplatformtoevaluatelargemetagenomiccollectionsaccordingtotheirmetabolicmachineryunravelingthesulfurcycle
AT eguiarteluise mebsasoftwareplatformtoevaluatelargemetagenomiccollectionsaccordingtotheirmetabolicmachineryunravelingthesulfurcycle
AT contrerasmoreirabruno mebsasoftwareplatformtoevaluatelargemetagenomiccollectionsaccordingtotheirmetabolicmachineryunravelingthesulfurcycle
AT souzavaleria mebsasoftwareplatformtoevaluatelargemetagenomiccollectionsaccordingtotheirmetabolicmachineryunravelingthesulfurcycle