Cargando…

MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction

Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary str...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Tianqi, Liu, Jian, Guo, Zhiye, Hou, Jie, Cheng, Jianlin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8222248/
https://www.ncbi.nlm.nih.gov/pubmed/34162922
http://dx.doi.org/10.1038/s41598-021-92395-6
_version_ 1783711453396598784
author Wu, Tianqi
Liu, Jian
Guo, Zhiye
Hou, Jie
Cheng, Jianlin
author_facet Wu, Tianqi
Liu, Jian
Guo, Zhiye
Hou, Jie
Cheng, Jianlin
author_sort Wu, Tianqi
collection PubMed
description Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system—MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available at https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.
format Online
Article
Text
id pubmed-8222248
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-82222482021-06-24 MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction Wu, Tianqi Liu, Jian Guo, Zhiye Hou, Jie Cheng, Jianlin Sci Rep Article Protein structure prediction is an important problem in bioinformatics and has been studied for decades. However, there are still few open-source comprehensive protein structure prediction packages publicly available in the field. In this paper, we present our latest open-source protein tertiary structure prediction system—MULTICOM2, an integration of template-based modeling (TBM) and template-free modeling (FM) methods. The template-based modeling uses sequence alignment tools with deep multiple sequence alignments to search for structural templates, which are much faster and more accurate than MULTICOM1. The template-free (ab initio or de novo) modeling uses the inter-residue distances predicted by DeepDist to reconstruct tertiary structure models without using any known structure as template. In the blind CASP14 experiment, the average TM-score of the models predicted by our server predictor based on the MULTICOM2 system is 0.720 for 58 TBM (regular) domains and 0.514 for 38 FM and FM/TBM (hard) domains, indicating that MULTICOM2 is capable of predicting good tertiary structures across the board. It can predict the correct fold for 76 CASP14 domains (95% regular domains and 55% hard domains) if only one prediction is made for a domain. The success rate is increased to 3% for both regular and hard domains if five predictions are made per domain. Moreover, the prediction accuracy of the pure template-free structure modeling method on both TBM and FM targets is very close to the combination of template-based and template-free modeling methods. This demonstrates that the distance-based template-free modeling method powered by deep learning can largely replace the traditional template-based modeling method even on TBM targets that TBM methods used to dominate and therefore provides a uniform structure modeling approach to any protein. Finally, on the 38 CASP14 FM and FM/TBM hard domains, MULTICOM2 server predictors (MULTICOM-HYBRID, MULTICOM-DEEP, MULTICOM-DIST) were ranked among the top 20 automated server predictors in the CASP14 experiment. After combining multiple predictors from the same research group as one entry, MULTICOM-HYBRID was ranked no. 5. The source code of MULTICOM2 is freely available at https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0. Nature Publishing Group UK 2021-06-23 /pmc/articles/PMC8222248/ /pubmed/34162922 http://dx.doi.org/10.1038/s41598-021-92395-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Wu, Tianqi
Liu, Jian
Guo, Zhiye
Hou, Jie
Cheng, Jianlin
MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction
title MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction
title_full MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction
title_fullStr MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction
title_full_unstemmed MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction
title_short MULTICOM2 open-source protein structure prediction system powered by deep learning and distance prediction
title_sort multicom2 open-source protein structure prediction system powered by deep learning and distance prediction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8222248/
https://www.ncbi.nlm.nih.gov/pubmed/34162922
http://dx.doi.org/10.1038/s41598-021-92395-6
work_keys_str_mv AT wutianqi multicom2opensourceproteinstructurepredictionsystempoweredbydeeplearninganddistanceprediction
AT liujian multicom2opensourceproteinstructurepredictionsystempoweredbydeeplearninganddistanceprediction
AT guozhiye multicom2opensourceproteinstructurepredictionsystempoweredbydeeplearninganddistanceprediction
AT houjie multicom2opensourceproteinstructurepredictionsystempoweredbydeeplearninganddistanceprediction
AT chengjianlin multicom2opensourceproteinstructurepredictionsystempoweredbydeeplearninganddistanceprediction