Cargando…

Improving protein structure prediction using templates and sequence embedding

MOTIVATION: Protein structure prediction has been greatly improved by deep learning, but the contribution of different information is yet to be fully understood. This article studies the impacts of two kinds of information for structure prediction: template and multiple sequence alignment (MSA) embe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wu, Fandi, Jing, Xiaoyang, Luo, Xiao, Xu, Jinbo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9805584/ https://www.ncbi.nlm.nih.gov/pubmed/36355462 http://dx.doi.org/10.1093/bioinformatics/btac723

_version_	1784862359188144128
author	Wu, Fandi Jing, Xiaoyang Luo, Xiao Xu, Jinbo
author_facet	Wu, Fandi Jing, Xiaoyang Luo, Xiao Xu, Jinbo
author_sort	Wu, Fandi
collection	PubMed
description	MOTIVATION: Protein structure prediction has been greatly improved by deep learning, but the contribution of different information is yet to be fully understood. This article studies the impacts of two kinds of information for structure prediction: template and multiple sequence alignment (MSA) embedding. Templates have been used by some methods before, such as AlphaFold2, RoseTTAFold and RaptorX. AlphaFold2 and RosetTTAFold only used templates detected by HHsearch, which may not perform very well on some targets. In addition, sequence embedding generated by pre-trained protein language models has not been fully explored for structure prediction. In this article, we study the impact of templates (including the number of templates, the template quality and how the templates are generated) on protein structure prediction accuracy, especially when the templates are detected by methods other than HHsearch. We also study the impact of sequence embedding (generated by MSATransformer and ESM-1b) on structure prediction. RESULTS: We have implemented a deep learning method for protein structure prediction that may take templates and MSA embedding as extra inputs. We study the contribution of templates and MSA embedding to structure prediction accuracy. Our experimental results show that templates can improve structure prediction on 71 of 110 CASP13 (13th Critical Assessment of Structure Prediction) targets and 47 of 91 CASP14 targets, and templates are particularly useful for targets with similar templates. MSA embedding can improve structure prediction on 63 of 91 CASP14 (14th Critical Assessment of Structure Prediction) targets and 87 of 183 CAMEO targets and is particularly useful for proteins with shallow MSAs. When both templates and MSA embedding are used, our method can predict correct folds (TMscore > 0.5) for 16 of 23 CASP14 FM targets and 14 of 18 Continuous Automated Model Evaluation (CAMEO) targets, outperforming RoseTTAFold by 5% and 7%, respectively. AVAILABILITY AND IMPLEMENTATION: Available at https://github.com/xluo233/RaptorXFold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-9805584
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-98055842023-01-03 Improving protein structure prediction using templates and sequence embedding Wu, Fandi Jing, Xiaoyang Luo, Xiao Xu, Jinbo Bioinformatics Original Paper MOTIVATION: Protein structure prediction has been greatly improved by deep learning, but the contribution of different information is yet to be fully understood. This article studies the impacts of two kinds of information for structure prediction: template and multiple sequence alignment (MSA) embedding. Templates have been used by some methods before, such as AlphaFold2, RoseTTAFold and RaptorX. AlphaFold2 and RosetTTAFold only used templates detected by HHsearch, which may not perform very well on some targets. In addition, sequence embedding generated by pre-trained protein language models has not been fully explored for structure prediction. In this article, we study the impact of templates (including the number of templates, the template quality and how the templates are generated) on protein structure prediction accuracy, especially when the templates are detected by methods other than HHsearch. We also study the impact of sequence embedding (generated by MSATransformer and ESM-1b) on structure prediction. RESULTS: We have implemented a deep learning method for protein structure prediction that may take templates and MSA embedding as extra inputs. We study the contribution of templates and MSA embedding to structure prediction accuracy. Our experimental results show that templates can improve structure prediction on 71 of 110 CASP13 (13th Critical Assessment of Structure Prediction) targets and 47 of 91 CASP14 targets, and templates are particularly useful for targets with similar templates. MSA embedding can improve structure prediction on 63 of 91 CASP14 (14th Critical Assessment of Structure Prediction) targets and 87 of 183 CAMEO targets and is particularly useful for proteins with shallow MSAs. When both templates and MSA embedding are used, our method can predict correct folds (TMscore > 0.5) for 16 of 23 CASP14 FM targets and 14 of 18 Continuous Automated Model Evaluation (CAMEO) targets, outperforming RoseTTAFold by 5% and 7%, respectively. AVAILABILITY AND IMPLEMENTATION: Available at https://github.com/xluo233/RaptorXFold. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-11-10 /pmc/articles/PMC9805584/ /pubmed/36355462 http://dx.doi.org/10.1093/bioinformatics/btac723 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Paper Wu, Fandi Jing, Xiaoyang Luo, Xiao Xu, Jinbo Improving protein structure prediction using templates and sequence embedding
title	Improving protein structure prediction using templates and sequence embedding
title_full	Improving protein structure prediction using templates and sequence embedding
title_fullStr	Improving protein structure prediction using templates and sequence embedding
title_full_unstemmed	Improving protein structure prediction using templates and sequence embedding
title_short	Improving protein structure prediction using templates and sequence embedding
title_sort	improving protein structure prediction using templates and sequence embedding
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9805584/ https://www.ncbi.nlm.nih.gov/pubmed/36355462 http://dx.doi.org/10.1093/bioinformatics/btac723
work_keys_str_mv	AT wufandi improvingproteinstructurepredictionusingtemplatesandsequenceembedding AT jingxiaoyang improvingproteinstructurepredictionusingtemplatesandsequenceembedding AT luoxiao improvingproteinstructurepredictionusingtemplatesandsequenceembedding AT xujinbo improvingproteinstructurepredictionusingtemplatesandsequenceembedding

Improving protein structure prediction using templates and sequence embedding

Ejemplares similares