Cargando…

Improving deep learning-based protein distance prediction in CASP14

MOTIVATION: Accurate prediction of residue–residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CA...

Descripción completa

Detalles Bibliográficos
Autores principales:	Guo, Zhiye, Wu, Tianqi, Liu, Jian, Hou, Jie, Cheng, Jianlin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8504632/ https://www.ncbi.nlm.nih.gov/pubmed/33961009 http://dx.doi.org/10.1093/bioinformatics/btab355

_version_	1784581359098920960
author	Guo, Zhiye Wu, Tianqi Liu, Jian Hou, Jie Cheng, Jianlin
author_facet	Guo, Zhiye Wu, Tianqi Liu, Jian Hou, Jie Cheng, Jianlin
author_sort	Guo, Zhiye
collection	PubMed
description	MOTIVATION: Accurate prediction of residue–residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. RESULTS: Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions [i.e. classifying distances between two residues into two categories: in contact (<8 Angstrom) and not in contact otherwise] and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th among automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the precision of distance classification. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps. AVAILABILITY AND IMPLEMENTATION: The software package, source code and data of DeepDist2 are freely available at https://github.com/multicom-toolbox/deepdist and https://zenodo.org/record/4712084#.YIIM13VKhQM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-8504632
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-85046322021-10-13 Improving deep learning-based protein distance prediction in CASP14 Guo, Zhiye Wu, Tianqi Liu, Jian Hou, Jie Cheng, Jianlin Bioinformatics Original Papers MOTIVATION: Accurate prediction of residue–residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. RESULTS: Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions [i.e. classifying distances between two residues into two categories: in contact (<8 Angstrom) and not in contact otherwise] and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th among automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the precision of distance classification. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps. AVAILABILITY AND IMPLEMENTATION: The software package, source code and data of DeepDist2 are freely available at https://github.com/multicom-toolbox/deepdist and https://zenodo.org/record/4712084#.YIIM13VKhQM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-05-07 /pmc/articles/PMC8504632/ /pubmed/33961009 http://dx.doi.org/10.1093/bioinformatics/btab355 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Original Papers Guo, Zhiye Wu, Tianqi Liu, Jian Hou, Jie Cheng, Jianlin Improving deep learning-based protein distance prediction in CASP14
title	Improving deep learning-based protein distance prediction in CASP14
title_full	Improving deep learning-based protein distance prediction in CASP14
title_fullStr	Improving deep learning-based protein distance prediction in CASP14
title_full_unstemmed	Improving deep learning-based protein distance prediction in CASP14
title_short	Improving deep learning-based protein distance prediction in CASP14
title_sort	improving deep learning-based protein distance prediction in casp14
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8504632/ https://www.ncbi.nlm.nih.gov/pubmed/33961009 http://dx.doi.org/10.1093/bioinformatics/btab355
work_keys_str_mv	AT guozhiye improvingdeeplearningbasedproteindistancepredictionincasp14 AT wutianqi improvingdeeplearningbasedproteindistancepredictionincasp14 AT liujian improvingdeeplearningbasedproteindistancepredictionincasp14 AT houjie improvingdeeplearningbasedproteindistancepredictionincasp14 AT chengjianlin improvingdeeplearningbasedproteindistancepredictionincasp14

Improving deep learning-based protein distance prediction in CASP14

Ejemplares similares