Cargando…

Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM

MOTIVATION: Protein domains are the basic units of proteins that can fold, function and evolve independently. Protein domain boundary partition plays an important role in protein structure prediction, understanding their biological functions, annotating their evolutionary mechanisms and protein desi...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Lei, Zhong, Haolin, Xue, Zhidong, Wang, Yan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710680/
https://www.ncbi.nlm.nih.gov/pubmed/36699417
http://dx.doi.org/10.1093/bioadv/vbac060
_version_ 1784841417754935296
author Wang, Lei
Zhong, Haolin
Xue, Zhidong
Wang, Yan
author_facet Wang, Lei
Zhong, Haolin
Xue, Zhidong
Wang, Yan
author_sort Wang, Lei
collection PubMed
description MOTIVATION: Protein domains are the basic units of proteins that can fold, function and evolve independently. Protein domain boundary partition plays an important role in protein structure prediction, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Although there are many methods that have been developed to predict domain boundaries from protein sequence over the past two decades, there is still much room for improvement. RESULTS: In this article, a novel domain boundary prediction tool called Res-Dom was developed, which is based on a deep residual network, bidirectional long short-term memory (Bi-LSTM) and transfer learning. We used deep residual neural networks to extract higher-order residue-related information. In addition, we also used a pre-trained protein language model called ESM to extract sequence embedded features, which can summarize sequence context information more abundantly. To improve the global representation of these deep residual networks, a Bi-LSTM network was also designed to consider long-range interactions between residues. Res-Dom was then tested on an independent test set including 342 proteins and generated correct single-domain and multi-domain classifications with a Matthew’s correlation coefficient of 0.668, which was 17.6% higher than the second-best compared method. For domain boundaries, the normalized domain overlapping score of Res-Dom was 0.849, which was 5% higher than the second-best compared method. Furthermore, Res-Dom required significantly less time than most of the recently developed state-of-the-art domain prediction methods. AVAILABILITY AND IMPLEMENTATION: All source code, datasets and model are available at http://isyslab.info/Res-Dom/.
format Online
Article
Text
id pubmed-9710680
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97106802023-01-24 Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM Wang, Lei Zhong, Haolin Xue, Zhidong Wang, Yan Bioinform Adv Original paper MOTIVATION: Protein domains are the basic units of proteins that can fold, function and evolve independently. Protein domain boundary partition plays an important role in protein structure prediction, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Although there are many methods that have been developed to predict domain boundaries from protein sequence over the past two decades, there is still much room for improvement. RESULTS: In this article, a novel domain boundary prediction tool called Res-Dom was developed, which is based on a deep residual network, bidirectional long short-term memory (Bi-LSTM) and transfer learning. We used deep residual neural networks to extract higher-order residue-related information. In addition, we also used a pre-trained protein language model called ESM to extract sequence embedded features, which can summarize sequence context information more abundantly. To improve the global representation of these deep residual networks, a Bi-LSTM network was also designed to consider long-range interactions between residues. Res-Dom was then tested on an independent test set including 342 proteins and generated correct single-domain and multi-domain classifications with a Matthew’s correlation coefficient of 0.668, which was 17.6% higher than the second-best compared method. For domain boundaries, the normalized domain overlapping score of Res-Dom was 0.849, which was 5% higher than the second-best compared method. Furthermore, Res-Dom required significantly less time than most of the recently developed state-of-the-art domain prediction methods. AVAILABILITY AND IMPLEMENTATION: All source code, datasets and model are available at http://isyslab.info/Res-Dom/. Oxford University Press 2022-09-01 /pmc/articles/PMC9710680/ /pubmed/36699417 http://dx.doi.org/10.1093/bioadv/vbac060 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original paper
Wang, Lei
Zhong, Haolin
Xue, Zhidong
Wang, Yan
Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM
title Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM
title_full Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM
title_fullStr Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM
title_full_unstemmed Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM
title_short Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM
title_sort res-dom: predicting protein domain boundary from sequence using deep residual network and bi-lstm
topic Original paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710680/
https://www.ncbi.nlm.nih.gov/pubmed/36699417
http://dx.doi.org/10.1093/bioadv/vbac060
work_keys_str_mv AT wanglei resdompredictingproteindomainboundaryfromsequenceusingdeepresidualnetworkandbilstm
AT zhonghaolin resdompredictingproteindomainboundaryfromsequenceusingdeepresidualnetworkandbilstm
AT xuezhidong resdompredictingproteindomainboundaryfromsequenceusingdeepresidualnetworkandbilstm
AT wangyan resdompredictingproteindomainboundaryfromsequenceusingdeepresidualnetworkandbilstm