Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps

Cryo-electron microscopy (cryo-EM) has become a leading technology for determining protein structures. Recent advances in this field have allowed for atomic resolution. However, predicting the backbone trace of a protein has remained a challenge on all but the most pristine density maps (<2.5 Å r...

Descripción completa

Detalles Bibliográficos
Autores principales: Si, Dong, Moritz, Spencer A., Pfab, Jonas, Hou, Jie, Cao, Renzhi, Wang, Liguo, Wu, Tianqi, Cheng, Jianlin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7063051/
https://www.ncbi.nlm.nih.gov/pubmed/32152330
http://dx.doi.org/10.1038/s41598-020-60598-y
_version_ 1783504634225098752
author Si, Dong
Moritz, Spencer A.
Pfab, Jonas
Hou, Jie
Cao, Renzhi
Wang, Liguo
Wu, Tianqi
Cheng, Jianlin
author_facet Si, Dong
Moritz, Spencer A.
Pfab, Jonas
Hou, Jie
Cao, Renzhi
Wang, Liguo
Wu, Tianqi
Cheng, Jianlin
author_sort Si, Dong
collection PubMed
description Cryo-electron microscopy (cryo-EM) has become a leading technology for determining protein structures. Recent advances in this field have allowed for atomic resolution. However, predicting the backbone trace of a protein has remained a challenge on all but the most pristine density maps (<2.5 Å resolution). Here we introduce a deep learning model that uses a set of cascaded convolutional neural networks (CNNs) to predict Cα atoms along a protein’s backbone structure. The cascaded-CNN (C-CNN) is a novel deep learning architecture comprised of multiple CNNs, each predicting a specific aspect of a protein’s structure. This model predicts secondary structure elements (SSEs), backbone structure, and Cα atoms, combining the results of each to produce a complete prediction map. The cascaded-CNN is a semantic segmentation image classifier and was trained using thousands of simulated density maps. This method is largely automatic and only requires a recommended threshold value for each protein density map. A specialized tabu-search path walking algorithm was used to produce an initial backbone trace with Cα placements. A helix-refinement algorithm made further improvements to the α-helix SSEs of the backbone trace. Finally, a novel quality assessment-based combinatorial algorithm was used to effectively map protein sequences onto Cα traces to obtain full-atom protein structures. This method was tested on 50 experimental maps between 2.6 Å and 4.4 Å resolution. It outperformed several state-of-the-art prediction methods including Rosetta de-novo, MAINMAST, and a Phenix based method by producing the most complete predicted protein structures, as measured by percentage of found Cα atoms. This method accurately predicted 88.9% (mean) of the Cα atoms within 3 Å of a protein’s backbone structure surpassing the 66.8% mark achieved by the leading alternate method (Phenix based fully automatic method) on the same set of density maps. The C-CNN also achieved an average root-mean-square deviation (RMSD) of 1.24 Å on a set of 50 experimental density maps which was tested by the Phenix based fully automatic method. The source code and demo of this research has been published at https://github.com/DrDongSi/Ca-Backbone-Prediction.
format Online
Article
Text
id pubmed-7063051
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-70630512020-03-18 Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps Si, Dong Moritz, Spencer A. Pfab, Jonas Hou, Jie Cao, Renzhi Wang, Liguo Wu, Tianqi Cheng, Jianlin Sci Rep Article Cryo-electron microscopy (cryo-EM) has become a leading technology for determining protein structures. Recent advances in this field have allowed for atomic resolution. However, predicting the backbone trace of a protein has remained a challenge on all but the most pristine density maps (<2.5 Å resolution). Here we introduce a deep learning model that uses a set of cascaded convolutional neural networks (CNNs) to predict Cα atoms along a protein’s backbone structure. The cascaded-CNN (C-CNN) is a novel deep learning architecture comprised of multiple CNNs, each predicting a specific aspect of a protein’s structure. This model predicts secondary structure elements (SSEs), backbone structure, and Cα atoms, combining the results of each to produce a complete prediction map. The cascaded-CNN is a semantic segmentation image classifier and was trained using thousands of simulated density maps. This method is largely automatic and only requires a recommended threshold value for each protein density map. A specialized tabu-search path walking algorithm was used to produce an initial backbone trace with Cα placements. A helix-refinement algorithm made further improvements to the α-helix SSEs of the backbone trace. Finally, a novel quality assessment-based combinatorial algorithm was used to effectively map protein sequences onto Cα traces to obtain full-atom protein structures. This method was tested on 50 experimental maps between 2.6 Å and 4.4 Å resolution. It outperformed several state-of-the-art prediction methods including Rosetta de-novo, MAINMAST, and a Phenix based method by producing the most complete predicted protein structures, as measured by percentage of found Cα atoms. This method accurately predicted 88.9% (mean) of the Cα atoms within 3 Å of a protein’s backbone structure surpassing the 66.8% mark achieved by the leading alternate method (Phenix based fully automatic method) on the same set of density maps. The C-CNN also achieved an average root-mean-square deviation (RMSD) of 1.24 Å on a set of 50 experimental density maps which was tested by the Phenix based fully automatic method. The source code and demo of this research has been published at https://github.com/DrDongSi/Ca-Backbone-Prediction. Nature Publishing Group UK 2020-03-09 /pmc/articles/PMC7063051/ /pubmed/32152330 http://dx.doi.org/10.1038/s41598-020-60598-y Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Si, Dong
Moritz, Spencer A.
Pfab, Jonas
Hou, Jie
Cao, Renzhi
Wang, Liguo
Wu, Tianqi
Cheng, Jianlin
Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps
title Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps
title_full Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps
title_fullStr Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps
title_full_unstemmed Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps
title_short Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps
title_sort deep learning to predict protein backbone structure from high-resolution cryo-em density maps
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7063051/
https://www.ncbi.nlm.nih.gov/pubmed/32152330
http://dx.doi.org/10.1038/s41598-020-60598-y
work_keys_str_mv AT sidong deeplearningtopredictproteinbackbonestructurefromhighresolutioncryoemdensitymaps
AT moritzspencera deeplearningtopredictproteinbackbonestructurefromhighresolutioncryoemdensitymaps
AT pfabjonas deeplearningtopredictproteinbackbonestructurefromhighresolutioncryoemdensitymaps
AT houjie deeplearningtopredictproteinbackbonestructurefromhighresolutioncryoemdensitymaps
AT caorenzhi deeplearningtopredictproteinbackbonestructurefromhighresolutioncryoemdensitymaps
AT wangliguo deeplearningtopredictproteinbackbonestructurefromhighresolutioncryoemdensitymaps
AT wutianqi deeplearningtopredictproteinbackbonestructurefromhighresolutioncryoemdensitymaps
AT chengjianlin deeplearningtopredictproteinbackbonestructurefromhighresolutioncryoemdensitymaps