Cargando…
SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model
MOTIVATION: Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist f...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9113311/ https://www.ncbi.nlm.nih.gov/pubmed/35104320 http://dx.doi.org/10.1093/bioinformatics/btac053 |
_version_ | 1784709560650432512 |
---|---|
author | Singh, Jaspreet Litfin, Thomas Singh, Jaswinder Paliwal, Kuldip Zhou, Yaoqi |
author_facet | Singh, Jaspreet Litfin, Thomas Singh, Jaswinder Paliwal, Kuldip Zhou, Yaoqi |
author_sort | Singh, Jaspreet |
collection | PubMed |
description | MOTIVATION: Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist for many proteins due to lack of naturally occurring homologous sequences. Moreover, generating evolutionary profiles is computationally intensive. Here, we developed a contact-map predictor utilizing the output of a pre-trained language model ESM-1b as an input along with a large training set and an ensemble of residual neural networks. RESULTS: We showed that the proposed method makes a significant improvement over a single-sequence-based predictor SSCpred with 15% improvement in the F1-score for the independent CASP14-FM test set. It also outperforms evolutionary-profile-based methods trRosetta and SPOT-Contact with 48.7% and 48.5% respective improvement in the F1-score on the proteins without homologs (Neff = 1) in the independent SPOT-2018 set. The new method provides a much faster and reasonably accurate alternative to evolution-based methods, useful for large-scale prediction. AVAILABILITY AND IMPLEMENTATION: Stand-alone-version of SPOT-Contact-LM is available at https://github.com/jas-preet/SPOT-Contact-Single. Direct prediction can also be made at https://sparks-lab.org/server/spot-contact-single. The datasets used in this research can also be downloaded from the GitHub. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-9113311 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-91133112022-05-18 SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model Singh, Jaspreet Litfin, Thomas Singh, Jaswinder Paliwal, Kuldip Zhou, Yaoqi Bioinformatics Original Papers MOTIVATION: Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist for many proteins due to lack of naturally occurring homologous sequences. Moreover, generating evolutionary profiles is computationally intensive. Here, we developed a contact-map predictor utilizing the output of a pre-trained language model ESM-1b as an input along with a large training set and an ensemble of residual neural networks. RESULTS: We showed that the proposed method makes a significant improvement over a single-sequence-based predictor SSCpred with 15% improvement in the F1-score for the independent CASP14-FM test set. It also outperforms evolutionary-profile-based methods trRosetta and SPOT-Contact with 48.7% and 48.5% respective improvement in the F1-score on the proteins without homologs (Neff = 1) in the independent SPOT-2018 set. The new method provides a much faster and reasonably accurate alternative to evolution-based methods, useful for large-scale prediction. AVAILABILITY AND IMPLEMENTATION: Stand-alone-version of SPOT-Contact-LM is available at https://github.com/jas-preet/SPOT-Contact-Single. Direct prediction can also be made at https://sparks-lab.org/server/spot-contact-single. The datasets used in this research can also be downloaded from the GitHub. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-02-01 /pmc/articles/PMC9113311/ /pubmed/35104320 http://dx.doi.org/10.1093/bioinformatics/btac053 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Singh, Jaspreet Litfin, Thomas Singh, Jaswinder Paliwal, Kuldip Zhou, Yaoqi SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model |
title | SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model |
title_full | SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model |
title_fullStr | SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model |
title_full_unstemmed | SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model |
title_short | SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model |
title_sort | spot-contact-lm: improving single-sequence-based prediction of protein contact map using a transformer language model |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9113311/ https://www.ncbi.nlm.nih.gov/pubmed/35104320 http://dx.doi.org/10.1093/bioinformatics/btac053 |
work_keys_str_mv | AT singhjaspreet spotcontactlmimprovingsinglesequencebasedpredictionofproteincontactmapusingatransformerlanguagemodel AT litfinthomas spotcontactlmimprovingsinglesequencebasedpredictionofproteincontactmapusingatransformerlanguagemodel AT singhjaswinder spotcontactlmimprovingsinglesequencebasedpredictionofproteincontactmapusingatransformerlanguagemodel AT paliwalkuldip spotcontactlmimprovingsinglesequencebasedpredictionofproteincontactmapusingatransformerlanguagemodel AT zhouyaoqi spotcontactlmimprovingsinglesequencebasedpredictionofproteincontactmapusingatransformerlanguagemodel |