Cargando…

Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations

Structure prediction for proteins lacking homologous templates in the Protein Data Bank (PDB) remains a significant unsolved problem. We developed a protocol, C-I-TASSER, to integrate interresidue contact maps from deep neural-network learning with the cutting-edge I-TASSER fragment assembly simulat...

Descripción completa

Detalles Bibliográficos
Autores principales: Zheng, Wei, Zhang, Chengxin, Li, Yang, Pearce, Robin, Bell, Eric W., Zhang, Yang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8336924/
https://www.ncbi.nlm.nih.gov/pubmed/34355210
http://dx.doi.org/10.1016/j.crmeth.2021.100014
_version_ 1783733403944747008
author Zheng, Wei
Zhang, Chengxin
Li, Yang
Pearce, Robin
Bell, Eric W.
Zhang, Yang
author_facet Zheng, Wei
Zhang, Chengxin
Li, Yang
Pearce, Robin
Bell, Eric W.
Zhang, Yang
author_sort Zheng, Wei
collection PubMed
description Structure prediction for proteins lacking homologous templates in the Protein Data Bank (PDB) remains a significant unsolved problem. We developed a protocol, C-I-TASSER, to integrate interresidue contact maps from deep neural-network learning with the cutting-edge I-TASSER fragment assembly simulations. Large-scale benchmark tests showed that C-I-TASSER can fold more than twice the number of non-homologous proteins than the I-TASSER, which does not use contacts. When applied to a folding experiment on 8,266 unsolved Pfam families, C-I-TASSER successfully folded 4,162 domain families, including 504 folds that are not found in the PDB. Furthermore, it created correct folds for 85% of proteins in the SARS-CoV-2 genome, despite the quick mutation rate of the virus and sparse sequence profiles. The results demonstrated the critical importance of coupling whole-genome and metagenome-based evolutionary information with optimal structure assembly simulations for solving the problem of non-homologous protein structure prediction.
format Online
Article
Text
id pubmed-8336924
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-83369242021-08-04 Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations Zheng, Wei Zhang, Chengxin Li, Yang Pearce, Robin Bell, Eric W. Zhang, Yang Cell Rep Methods Article Structure prediction for proteins lacking homologous templates in the Protein Data Bank (PDB) remains a significant unsolved problem. We developed a protocol, C-I-TASSER, to integrate interresidue contact maps from deep neural-network learning with the cutting-edge I-TASSER fragment assembly simulations. Large-scale benchmark tests showed that C-I-TASSER can fold more than twice the number of non-homologous proteins than the I-TASSER, which does not use contacts. When applied to a folding experiment on 8,266 unsolved Pfam families, C-I-TASSER successfully folded 4,162 domain families, including 504 folds that are not found in the PDB. Furthermore, it created correct folds for 85% of proteins in the SARS-CoV-2 genome, despite the quick mutation rate of the virus and sparse sequence profiles. The results demonstrated the critical importance of coupling whole-genome and metagenome-based evolutionary information with optimal structure assembly simulations for solving the problem of non-homologous protein structure prediction. Elsevier 2021-06-21 /pmc/articles/PMC8336924/ /pubmed/34355210 http://dx.doi.org/10.1016/j.crmeth.2021.100014 Text en © 2021 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Zheng, Wei
Zhang, Chengxin
Li, Yang
Pearce, Robin
Bell, Eric W.
Zhang, Yang
Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations
title Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations
title_full Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations
title_fullStr Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations
title_full_unstemmed Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations
title_short Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations
title_sort folding non-homologous proteins by coupling deep-learning contact maps with i-tasser assembly simulations
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8336924/
https://www.ncbi.nlm.nih.gov/pubmed/34355210
http://dx.doi.org/10.1016/j.crmeth.2021.100014
work_keys_str_mv AT zhengwei foldingnonhomologousproteinsbycouplingdeeplearningcontactmapswithitasserassemblysimulations
AT zhangchengxin foldingnonhomologousproteinsbycouplingdeeplearningcontactmapswithitasserassemblysimulations
AT liyang foldingnonhomologousproteinsbycouplingdeeplearningcontactmapswithitasserassemblysimulations
AT pearcerobin foldingnonhomologousproteinsbycouplingdeeplearningcontactmapswithitasserassemblysimulations
AT bellericw foldingnonhomologousproteinsbycouplingdeeplearningcontactmapswithitasserassemblysimulations
AT zhangyang foldingnonhomologousproteinsbycouplingdeeplearningcontactmapswithitasserassemblysimulations