Cargando…

DEEPre: sequence-based enzyme EC number prediction by deep learning

MOTIVATION: Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every e...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yu, Wang, Sheng, Umarov, Ramzan, Xie, Bingqing, Fan, Ming, Li, Lihua, Gao, Xin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030869/
https://www.ncbi.nlm.nih.gov/pubmed/29069344
http://dx.doi.org/10.1093/bioinformatics/btx680
_version_ 1783337211622588416
author Li, Yu
Wang, Sheng
Umarov, Ramzan
Xie, Bingqing
Fan, Ming
Li, Lihua
Gao, Xin
author_facet Li, Yu
Wang, Sheng
Umarov, Ramzan
Xie, Bingqing
Fan, Ming
Li, Lihua
Gao, Xin
author_sort Li, Yu
collection PubMed
description MOTIVATION: Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number. RESULTS: We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manually crafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre’s ability to capture the functional difference of enzyme isoforms. AVAILABILITY AND IMPLEMENTATION: The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6030869
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60308692018-07-10 DEEPre: sequence-based enzyme EC number prediction by deep learning Li, Yu Wang, Sheng Umarov, Ramzan Xie, Bingqing Fan, Ming Li, Lihua Gao, Xin Bioinformatics Original Papers MOTIVATION: Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number. RESULTS: We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manually crafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre’s ability to capture the functional difference of enzyme isoforms. AVAILABILITY AND IMPLEMENTATION: The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-03-01 2017-10-23 /pmc/articles/PMC6030869/ /pubmed/29069344 http://dx.doi.org/10.1093/bioinformatics/btx680 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Li, Yu
Wang, Sheng
Umarov, Ramzan
Xie, Bingqing
Fan, Ming
Li, Lihua
Gao, Xin
DEEPre: sequence-based enzyme EC number prediction by deep learning
title DEEPre: sequence-based enzyme EC number prediction by deep learning
title_full DEEPre: sequence-based enzyme EC number prediction by deep learning
title_fullStr DEEPre: sequence-based enzyme EC number prediction by deep learning
title_full_unstemmed DEEPre: sequence-based enzyme EC number prediction by deep learning
title_short DEEPre: sequence-based enzyme EC number prediction by deep learning
title_sort deepre: sequence-based enzyme ec number prediction by deep learning
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030869/
https://www.ncbi.nlm.nih.gov/pubmed/29069344
http://dx.doi.org/10.1093/bioinformatics/btx680
work_keys_str_mv AT liyu deepresequencebasedenzymeecnumberpredictionbydeeplearning
AT wangsheng deepresequencebasedenzymeecnumberpredictionbydeeplearning
AT umarovramzan deepresequencebasedenzymeecnumberpredictionbydeeplearning
AT xiebingqing deepresequencebasedenzymeecnumberpredictionbydeeplearning
AT fanming deepresequencebasedenzymeecnumberpredictionbydeeplearning
AT lilihua deepresequencebasedenzymeecnumberpredictionbydeeplearning
AT gaoxin deepresequencebasedenzymeecnumberpredictionbydeeplearning