Cargando…

MCIC: Automated Identification of Cellulases From Metagenomic Data and Characterization Based on Temperature and pH Dependence

As the availability of high-throughput metagenomic data is increasing, agile and accurate tools are required to analyze and exploit this valuable and plentiful resource. Cellulose-degrading enzymes have various applications, and finding appropriate cellulases for different purposes is becoming incre...

Descripción completa

Detalles Bibliográficos
Autores principales: Foroozandeh Shahraki, Mehdi, Ariaeenejad, Shohreh, Fallah Atanaki, Fereshteh, Zolfaghari, Behrouz, Koshiba, Takeshi, Kavousi, Kaveh, Salekdeh, Ghasem Hosseini
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7645119/
https://www.ncbi.nlm.nih.gov/pubmed/33193158
http://dx.doi.org/10.3389/fmicb.2020.567863
_version_ 1783606600741683200
author Foroozandeh Shahraki, Mehdi
Ariaeenejad, Shohreh
Fallah Atanaki, Fereshteh
Zolfaghari, Behrouz
Koshiba, Takeshi
Kavousi, Kaveh
Salekdeh, Ghasem Hosseini
author_facet Foroozandeh Shahraki, Mehdi
Ariaeenejad, Shohreh
Fallah Atanaki, Fereshteh
Zolfaghari, Behrouz
Koshiba, Takeshi
Kavousi, Kaveh
Salekdeh, Ghasem Hosseini
author_sort Foroozandeh Shahraki, Mehdi
collection PubMed
description As the availability of high-throughput metagenomic data is increasing, agile and accurate tools are required to analyze and exploit this valuable and plentiful resource. Cellulose-degrading enzymes have various applications, and finding appropriate cellulases for different purposes is becoming increasingly challenging. An in silico screening method for high-throughput data can be of great assistance when combined with the characterization of thermal and pH dependence. By this means, various metagenomic sources with high cellulolytic potentials can be explored. Using a sequence similarity-based annotation and an ensemble of supervised learning algorithms, this study aims to identify and characterize cellulolytic enzymes from a given high-throughput metagenomic data based on optimum temperature and pH. The prediction performance of MCIC (metagenome cellulase identification and characterization) was evaluated through multiple iterations of sixfold cross-validation tests. This tool was also implemented for a comparative analysis of four metagenomic sources to estimate their cellulolytic profile and capabilities. For experimental validation of MCIC’s screening and prediction abilities, two identified enzymes from cattle rumen were subjected to cloning, expression, and characterization. To the best of our knowledge, this is the first time that a sequence-similarity based method is used alongside an ensemble machine learning model to identify and characterize cellulase enzymes from extensive metagenomic data. This study highlights the strength of machine learning techniques to predict enzymatic properties solely based on their sequence. MCIC is freely available as a python package and standalone toolkit for Windows and Linux-based operating systems with several functions to facilitate the screening and thermal and pH dependence prediction of cellulases.
format Online
Article
Text
id pubmed-7645119
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-76451192020-11-13 MCIC: Automated Identification of Cellulases From Metagenomic Data and Characterization Based on Temperature and pH Dependence Foroozandeh Shahraki, Mehdi Ariaeenejad, Shohreh Fallah Atanaki, Fereshteh Zolfaghari, Behrouz Koshiba, Takeshi Kavousi, Kaveh Salekdeh, Ghasem Hosseini Front Microbiol Microbiology As the availability of high-throughput metagenomic data is increasing, agile and accurate tools are required to analyze and exploit this valuable and plentiful resource. Cellulose-degrading enzymes have various applications, and finding appropriate cellulases for different purposes is becoming increasingly challenging. An in silico screening method for high-throughput data can be of great assistance when combined with the characterization of thermal and pH dependence. By this means, various metagenomic sources with high cellulolytic potentials can be explored. Using a sequence similarity-based annotation and an ensemble of supervised learning algorithms, this study aims to identify and characterize cellulolytic enzymes from a given high-throughput metagenomic data based on optimum temperature and pH. The prediction performance of MCIC (metagenome cellulase identification and characterization) was evaluated through multiple iterations of sixfold cross-validation tests. This tool was also implemented for a comparative analysis of four metagenomic sources to estimate their cellulolytic profile and capabilities. For experimental validation of MCIC’s screening and prediction abilities, two identified enzymes from cattle rumen were subjected to cloning, expression, and characterization. To the best of our knowledge, this is the first time that a sequence-similarity based method is used alongside an ensemble machine learning model to identify and characterize cellulase enzymes from extensive metagenomic data. This study highlights the strength of machine learning techniques to predict enzymatic properties solely based on their sequence. MCIC is freely available as a python package and standalone toolkit for Windows and Linux-based operating systems with several functions to facilitate the screening and thermal and pH dependence prediction of cellulases. Frontiers Media S.A. 2020-10-23 /pmc/articles/PMC7645119/ /pubmed/33193158 http://dx.doi.org/10.3389/fmicb.2020.567863 Text en Copyright © 2020 Foroozandeh Shahraki, Ariaeenejad, Fallah Atanaki, Zolfaghari, Koshiba, Kavousi and Salekdeh. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Foroozandeh Shahraki, Mehdi
Ariaeenejad, Shohreh
Fallah Atanaki, Fereshteh
Zolfaghari, Behrouz
Koshiba, Takeshi
Kavousi, Kaveh
Salekdeh, Ghasem Hosseini
MCIC: Automated Identification of Cellulases From Metagenomic Data and Characterization Based on Temperature and pH Dependence
title MCIC: Automated Identification of Cellulases From Metagenomic Data and Characterization Based on Temperature and pH Dependence
title_full MCIC: Automated Identification of Cellulases From Metagenomic Data and Characterization Based on Temperature and pH Dependence
title_fullStr MCIC: Automated Identification of Cellulases From Metagenomic Data and Characterization Based on Temperature and pH Dependence
title_full_unstemmed MCIC: Automated Identification of Cellulases From Metagenomic Data and Characterization Based on Temperature and pH Dependence
title_short MCIC: Automated Identification of Cellulases From Metagenomic Data and Characterization Based on Temperature and pH Dependence
title_sort mcic: automated identification of cellulases from metagenomic data and characterization based on temperature and ph dependence
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7645119/
https://www.ncbi.nlm.nih.gov/pubmed/33193158
http://dx.doi.org/10.3389/fmicb.2020.567863
work_keys_str_mv AT foroozandehshahrakimehdi mcicautomatedidentificationofcellulasesfrommetagenomicdataandcharacterizationbasedontemperatureandphdependence
AT ariaeenejadshohreh mcicautomatedidentificationofcellulasesfrommetagenomicdataandcharacterizationbasedontemperatureandphdependence
AT fallahatanakifereshteh mcicautomatedidentificationofcellulasesfrommetagenomicdataandcharacterizationbasedontemperatureandphdependence
AT zolfagharibehrouz mcicautomatedidentificationofcellulasesfrommetagenomicdataandcharacterizationbasedontemperatureandphdependence
AT koshibatakeshi mcicautomatedidentificationofcellulasesfrommetagenomicdataandcharacterizationbasedontemperatureandphdependence
AT kavousikaveh mcicautomatedidentificationofcellulasesfrommetagenomicdataandcharacterizationbasedontemperatureandphdependence
AT salekdehghasemhosseini mcicautomatedidentificationofcellulasesfrommetagenomicdataandcharacterizationbasedontemperatureandphdependence