Cargando…
Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM
MOTIVATION: Protein domains are the basic units of proteins that can fold, function and evolve independently. Protein domain boundary partition plays an important role in protein structure prediction, understanding their biological functions, annotating their evolutionary mechanisms and protein desi...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710680/ https://www.ncbi.nlm.nih.gov/pubmed/36699417 http://dx.doi.org/10.1093/bioadv/vbac060 |
_version_ | 1784841417754935296 |
---|---|
author | Wang, Lei Zhong, Haolin Xue, Zhidong Wang, Yan |
author_facet | Wang, Lei Zhong, Haolin Xue, Zhidong Wang, Yan |
author_sort | Wang, Lei |
collection | PubMed |
description | MOTIVATION: Protein domains are the basic units of proteins that can fold, function and evolve independently. Protein domain boundary partition plays an important role in protein structure prediction, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Although there are many methods that have been developed to predict domain boundaries from protein sequence over the past two decades, there is still much room for improvement. RESULTS: In this article, a novel domain boundary prediction tool called Res-Dom was developed, which is based on a deep residual network, bidirectional long short-term memory (Bi-LSTM) and transfer learning. We used deep residual neural networks to extract higher-order residue-related information. In addition, we also used a pre-trained protein language model called ESM to extract sequence embedded features, which can summarize sequence context information more abundantly. To improve the global representation of these deep residual networks, a Bi-LSTM network was also designed to consider long-range interactions between residues. Res-Dom was then tested on an independent test set including 342 proteins and generated correct single-domain and multi-domain classifications with a Matthew’s correlation coefficient of 0.668, which was 17.6% higher than the second-best compared method. For domain boundaries, the normalized domain overlapping score of Res-Dom was 0.849, which was 5% higher than the second-best compared method. Furthermore, Res-Dom required significantly less time than most of the recently developed state-of-the-art domain prediction methods. AVAILABILITY AND IMPLEMENTATION: All source code, datasets and model are available at http://isyslab.info/Res-Dom/. |
format | Online Article Text |
id | pubmed-9710680 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-97106802023-01-24 Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM Wang, Lei Zhong, Haolin Xue, Zhidong Wang, Yan Bioinform Adv Original paper MOTIVATION: Protein domains are the basic units of proteins that can fold, function and evolve independently. Protein domain boundary partition plays an important role in protein structure prediction, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Although there are many methods that have been developed to predict domain boundaries from protein sequence over the past two decades, there is still much room for improvement. RESULTS: In this article, a novel domain boundary prediction tool called Res-Dom was developed, which is based on a deep residual network, bidirectional long short-term memory (Bi-LSTM) and transfer learning. We used deep residual neural networks to extract higher-order residue-related information. In addition, we also used a pre-trained protein language model called ESM to extract sequence embedded features, which can summarize sequence context information more abundantly. To improve the global representation of these deep residual networks, a Bi-LSTM network was also designed to consider long-range interactions between residues. Res-Dom was then tested on an independent test set including 342 proteins and generated correct single-domain and multi-domain classifications with a Matthew’s correlation coefficient of 0.668, which was 17.6% higher than the second-best compared method. For domain boundaries, the normalized domain overlapping score of Res-Dom was 0.849, which was 5% higher than the second-best compared method. Furthermore, Res-Dom required significantly less time than most of the recently developed state-of-the-art domain prediction methods. AVAILABILITY AND IMPLEMENTATION: All source code, datasets and model are available at http://isyslab.info/Res-Dom/. Oxford University Press 2022-09-01 /pmc/articles/PMC9710680/ /pubmed/36699417 http://dx.doi.org/10.1093/bioadv/vbac060 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original paper Wang, Lei Zhong, Haolin Xue, Zhidong Wang, Yan Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM |
title | Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM |
title_full | Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM |
title_fullStr | Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM |
title_full_unstemmed | Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM |
title_short | Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM |
title_sort | res-dom: predicting protein domain boundary from sequence using deep residual network and bi-lstm |
topic | Original paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710680/ https://www.ncbi.nlm.nih.gov/pubmed/36699417 http://dx.doi.org/10.1093/bioadv/vbac060 |
work_keys_str_mv | AT wanglei resdompredictingproteindomainboundaryfromsequenceusingdeepresidualnetworkandbilstm AT zhonghaolin resdompredictingproteindomainboundaryfromsequenceusingdeepresidualnetworkandbilstm AT xuezhidong resdompredictingproteindomainboundaryfromsequenceusingdeepresidualnetworkandbilstm AT wangyan resdompredictingproteindomainboundaryfromsequenceusingdeepresidualnetworkandbilstm |