Cargando…

Improving deep learning-based protein distance prediction in CASP14

MOTIVATION: Accurate prediction of residue–residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CA...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Zhiye, Wu, Tianqi, Liu, Jian, Hou, Jie, Cheng, Jianlin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8504632/
https://www.ncbi.nlm.nih.gov/pubmed/33961009
http://dx.doi.org/10.1093/bioinformatics/btab355
_version_ 1784581359098920960
author Guo, Zhiye
Wu, Tianqi
Liu, Jian
Hou, Jie
Cheng, Jianlin
author_facet Guo, Zhiye
Wu, Tianqi
Liu, Jian
Hou, Jie
Cheng, Jianlin
author_sort Guo, Zhiye
collection PubMed
description MOTIVATION: Accurate prediction of residue–residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. RESULTS: Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions [i.e. classifying distances between two residues into two categories: in contact (<8 Angstrom) and not in contact otherwise] and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th among automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the precision of distance classification. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps. AVAILABILITY AND IMPLEMENTATION: The software package, source code and data of DeepDist2 are freely available at https://github.com/multicom-toolbox/deepdist and https://zenodo.org/record/4712084#.YIIM13VKhQM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8504632
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85046322021-10-13 Improving deep learning-based protein distance prediction in CASP14 Guo, Zhiye Wu, Tianqi Liu, Jian Hou, Jie Cheng, Jianlin Bioinformatics Original Papers MOTIVATION: Accurate prediction of residue–residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. RESULTS: Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions [i.e. classifying distances between two residues into two categories: in contact (<8 Angstrom) and not in contact otherwise] and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th among automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the precision of distance classification. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps. AVAILABILITY AND IMPLEMENTATION: The software package, source code and data of DeepDist2 are freely available at https://github.com/multicom-toolbox/deepdist and https://zenodo.org/record/4712084#.YIIM13VKhQM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-05-07 /pmc/articles/PMC8504632/ /pubmed/33961009 http://dx.doi.org/10.1093/bioinformatics/btab355 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Guo, Zhiye
Wu, Tianqi
Liu, Jian
Hou, Jie
Cheng, Jianlin
Improving deep learning-based protein distance prediction in CASP14
title Improving deep learning-based protein distance prediction in CASP14
title_full Improving deep learning-based protein distance prediction in CASP14
title_fullStr Improving deep learning-based protein distance prediction in CASP14
title_full_unstemmed Improving deep learning-based protein distance prediction in CASP14
title_short Improving deep learning-based protein distance prediction in CASP14
title_sort improving deep learning-based protein distance prediction in casp14
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8504632/
https://www.ncbi.nlm.nih.gov/pubmed/33961009
http://dx.doi.org/10.1093/bioinformatics/btab355
work_keys_str_mv AT guozhiye improvingdeeplearningbasedproteindistancepredictionincasp14
AT wutianqi improvingdeeplearningbasedproteindistancepredictionincasp14
AT liujian improvingdeeplearningbasedproteindistancepredictionincasp14
AT houjie improvingdeeplearningbasedproteindistancepredictionincasp14
AT chengjianlin improvingdeeplearningbasedproteindistancepredictionincasp14