Cargando…

SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model

MOTIVATION: Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist f...

Descripción completa

Detalles Bibliográficos
Autores principales: Singh, Jaspreet, Litfin, Thomas, Singh, Jaswinder, Paliwal, Kuldip, Zhou, Yaoqi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9113311/
https://www.ncbi.nlm.nih.gov/pubmed/35104320
http://dx.doi.org/10.1093/bioinformatics/btac053
_version_ 1784709560650432512
author Singh, Jaspreet
Litfin, Thomas
Singh, Jaswinder
Paliwal, Kuldip
Zhou, Yaoqi
author_facet Singh, Jaspreet
Litfin, Thomas
Singh, Jaswinder
Paliwal, Kuldip
Zhou, Yaoqi
author_sort Singh, Jaspreet
collection PubMed
description MOTIVATION: Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist for many proteins due to lack of naturally occurring homologous sequences. Moreover, generating evolutionary profiles is computationally intensive. Here, we developed a contact-map predictor utilizing the output of a pre-trained language model ESM-1b as an input along with a large training set and an ensemble of residual neural networks. RESULTS: We showed that the proposed method makes a significant improvement over a single-sequence-based predictor SSCpred with 15% improvement in the F1-score for the independent CASP14-FM test set. It also outperforms evolutionary-profile-based methods trRosetta and SPOT-Contact with 48.7% and 48.5% respective improvement in the F1-score on the proteins without homologs (Neff = 1) in the independent SPOT-2018 set. The new method provides a much faster and reasonably accurate alternative to evolution-based methods, useful for large-scale prediction. AVAILABILITY AND IMPLEMENTATION: Stand-alone-version of SPOT-Contact-LM is available at https://github.com/jas-preet/SPOT-Contact-Single. Direct prediction can also be made at https://sparks-lab.org/server/spot-contact-single. The datasets used in this research can also be downloaded from the GitHub. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9113311
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-91133112022-05-18 SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model Singh, Jaspreet Litfin, Thomas Singh, Jaswinder Paliwal, Kuldip Zhou, Yaoqi Bioinformatics Original Papers MOTIVATION: Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist for many proteins due to lack of naturally occurring homologous sequences. Moreover, generating evolutionary profiles is computationally intensive. Here, we developed a contact-map predictor utilizing the output of a pre-trained language model ESM-1b as an input along with a large training set and an ensemble of residual neural networks. RESULTS: We showed that the proposed method makes a significant improvement over a single-sequence-based predictor SSCpred with 15% improvement in the F1-score for the independent CASP14-FM test set. It also outperforms evolutionary-profile-based methods trRosetta and SPOT-Contact with 48.7% and 48.5% respective improvement in the F1-score on the proteins without homologs (Neff = 1) in the independent SPOT-2018 set. The new method provides a much faster and reasonably accurate alternative to evolution-based methods, useful for large-scale prediction. AVAILABILITY AND IMPLEMENTATION: Stand-alone-version of SPOT-Contact-LM is available at https://github.com/jas-preet/SPOT-Contact-Single. Direct prediction can also be made at https://sparks-lab.org/server/spot-contact-single. The datasets used in this research can also be downloaded from the GitHub. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-02-01 /pmc/articles/PMC9113311/ /pubmed/35104320 http://dx.doi.org/10.1093/bioinformatics/btac053 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Singh, Jaspreet
Litfin, Thomas
Singh, Jaswinder
Paliwal, Kuldip
Zhou, Yaoqi
SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model
title SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model
title_full SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model
title_fullStr SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model
title_full_unstemmed SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model
title_short SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model
title_sort spot-contact-lm: improving single-sequence-based prediction of protein contact map using a transformer language model
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9113311/
https://www.ncbi.nlm.nih.gov/pubmed/35104320
http://dx.doi.org/10.1093/bioinformatics/btac053
work_keys_str_mv AT singhjaspreet spotcontactlmimprovingsinglesequencebasedpredictionofproteincontactmapusingatransformerlanguagemodel
AT litfinthomas spotcontactlmimprovingsinglesequencebasedpredictionofproteincontactmapusingatransformerlanguagemodel
AT singhjaswinder spotcontactlmimprovingsinglesequencebasedpredictionofproteincontactmapusingatransformerlanguagemodel
AT paliwalkuldip spotcontactlmimprovingsinglesequencebasedpredictionofproteincontactmapusingatransformerlanguagemodel
AT zhouyaoqi spotcontactlmimprovingsinglesequencebasedpredictionofproteincontactmapusingatransformerlanguagemodel