Cargando…
Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps
The information about the domain architecture of proteins is useful for studying protein structure and function. However, accurate prediction of protein domain boundaries (i.e., sequence regions separating two domains) from sequence remains a significant challenge. In this work, we develop a deep le...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9295499/ https://www.ncbi.nlm.nih.gov/pubmed/35854211 http://dx.doi.org/10.1186/s12859-022-04829-1 |
_version_ | 1784750069562474496 |
---|---|
author | Mahmud, Sajid Guo, Zhiye Quadir, Farhan Liu, Jian Cheng, Jianlin |
author_facet | Mahmud, Sajid Guo, Zhiye Quadir, Farhan Liu, Jian Cheng, Jianlin |
author_sort | Mahmud, Sajid |
collection | PubMed |
description | The information about the domain architecture of proteins is useful for studying protein structure and function. However, accurate prediction of protein domain boundaries (i.e., sequence regions separating two domains) from sequence remains a significant challenge. In this work, we develop a deep learning method based on multi-head U-Nets (called DistDom) to predict protein domain boundaries utilizing 1D sequence features and predicted 2D inter-residue distance map as input. The 1D features contain the evolutionary and physicochemical information of protein sequences, whereas the 2D distance map includes the structural information of proteins that was rarely used in domain boundary prediction before. The 1D and 2D features are processed by the 1D and 2D U-Nets respectively to generate hidden features. The hidden features are then used by the multi-head attention to predict the probability of each residue of a protein being in a domain boundary, leveraging both local and global information in the features. The residue-level domain boundary predictions can be used to classify proteins as single-domain or multi-domain proteins. It classifies the CASP14 single-domain and multi-domain targets at the accuracy of 75.9%, 13.28% more accurate than the state-of-the-art method. Tested on the CASP14 multi-domain protein targets with expert annotated domain boundaries, the average per-target F1 measure score of the domain boundary prediction by DistDom is 0.263, 29.56% higher than the state-of-the-art method. |
format | Online Article Text |
id | pubmed-9295499 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-92954992022-07-20 Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps Mahmud, Sajid Guo, Zhiye Quadir, Farhan Liu, Jian Cheng, Jianlin BMC Bioinformatics Research The information about the domain architecture of proteins is useful for studying protein structure and function. However, accurate prediction of protein domain boundaries (i.e., sequence regions separating two domains) from sequence remains a significant challenge. In this work, we develop a deep learning method based on multi-head U-Nets (called DistDom) to predict protein domain boundaries utilizing 1D sequence features and predicted 2D inter-residue distance map as input. The 1D features contain the evolutionary and physicochemical information of protein sequences, whereas the 2D distance map includes the structural information of proteins that was rarely used in domain boundary prediction before. The 1D and 2D features are processed by the 1D and 2D U-Nets respectively to generate hidden features. The hidden features are then used by the multi-head attention to predict the probability of each residue of a protein being in a domain boundary, leveraging both local and global information in the features. The residue-level domain boundary predictions can be used to classify proteins as single-domain or multi-domain proteins. It classifies the CASP14 single-domain and multi-domain targets at the accuracy of 75.9%, 13.28% more accurate than the state-of-the-art method. Tested on the CASP14 multi-domain protein targets with expert annotated domain boundaries, the average per-target F1 measure score of the domain boundary prediction by DistDom is 0.263, 29.56% higher than the state-of-the-art method. BioMed Central 2022-07-19 /pmc/articles/PMC9295499/ /pubmed/35854211 http://dx.doi.org/10.1186/s12859-022-04829-1 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Mahmud, Sajid Guo, Zhiye Quadir, Farhan Liu, Jian Cheng, Jianlin Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps |
title | Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps |
title_full | Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps |
title_fullStr | Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps |
title_full_unstemmed | Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps |
title_short | Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps |
title_sort | multi-head attention-based u-nets for predicting protein domain boundaries using 1d sequence features and 2d distance maps |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9295499/ https://www.ncbi.nlm.nih.gov/pubmed/35854211 http://dx.doi.org/10.1186/s12859-022-04829-1 |
work_keys_str_mv | AT mahmudsajid multiheadattentionbasedunetsforpredictingproteindomainboundariesusing1dsequencefeaturesand2ddistancemaps AT guozhiye multiheadattentionbasedunetsforpredictingproteindomainboundariesusing1dsequencefeaturesand2ddistancemaps AT quadirfarhan multiheadattentionbasedunetsforpredictingproteindomainboundariesusing1dsequencefeaturesand2ddistancemaps AT liujian multiheadattentionbasedunetsforpredictingproteindomainboundariesusing1dsequencefeaturesand2ddistancemaps AT chengjianlin multiheadattentionbasedunetsforpredictingproteindomainboundariesusing1dsequencefeaturesand2ddistancemaps |