Cargando…

Predicting protein inter-residue contacts using composite likelihood maximization and deep learning

BACKGROUND: Accurate prediction of inter-residue contacts of a protein is important to calculating its tertiary structure. Analysis of co-evolutionary events among residues has been proved effective in inferring inter-residue contacts. The Markov random field (MRF) technique, although being widely u...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Haicang, Zhang, Qi, Ju, Fusong, Zhu, Jianwei, Gao, Yujuan, Xie, Ziwei, Deng, Minghua, Sun, Shiwei, Zheng, Wei-Mou, Bu, Dongbo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6821021/
https://www.ncbi.nlm.nih.gov/pubmed/31664895
http://dx.doi.org/10.1186/s12859-019-3051-7
_version_ 1783464068081778688
author Zhang, Haicang
Zhang, Qi
Ju, Fusong
Zhu, Jianwei
Gao, Yujuan
Xie, Ziwei
Deng, Minghua
Sun, Shiwei
Zheng, Wei-Mou
Bu, Dongbo
author_facet Zhang, Haicang
Zhang, Qi
Ju, Fusong
Zhu, Jianwei
Gao, Yujuan
Xie, Ziwei
Deng, Minghua
Sun, Shiwei
Zheng, Wei-Mou
Bu, Dongbo
author_sort Zhang, Haicang
collection PubMed
description BACKGROUND: Accurate prediction of inter-residue contacts of a protein is important to calculating its tertiary structure. Analysis of co-evolutionary events among residues has been proved effective in inferring inter-residue contacts. The Markov random field (MRF) technique, although being widely used for contact prediction, suffers from the following dilemma: the actual likelihood function of MRF is accurate but time-consuming to calculate; in contrast, approximations to the actual likelihood, say pseudo-likelihood, are efficient to calculate but inaccurate. Thus, how to achieve both accuracy and efficiency simultaneously remains a challenge. RESULTS: In this study, we present such an approach (called clmDCA) for contact prediction. Unlike plmDCA using pseudo-likelihood, i.e., the product of conditional probability of individual residues, our approach uses composite-likelihood, i.e., the product of conditional probability of all residue pairs. Composite likelihood has been theoretically proved as a better approximation to the actual likelihood function than pseudo-likelihood. Meanwhile, composite likelihood is still efficient to maximize, thus ensuring the efficiency of clmDCA. We present comprehensive experiments on popular benchmark datasets, including PSICOV dataset and CASP-11 dataset, to show that: i) clmDCA alone outperforms the existing MRF-based approaches in prediction accuracy. ii) When equipped with deep learning technique for refinement, the prediction accuracy of clmDCA was further significantly improved, suggesting the suitability of clmDCA for subsequent refinement procedure. We further present a successful application of the predicted contacts to accurately build tertiary structures for proteins in the PSICOV dataset. CONCLUSIONS: Composite likelihood maximization algorithm can efficiently estimate the parameters of Markov Random Fields and can improve the prediction accuracy of protein inter-residue contacts. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3051-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6821021
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68210212019-11-04 Predicting protein inter-residue contacts using composite likelihood maximization and deep learning Zhang, Haicang Zhang, Qi Ju, Fusong Zhu, Jianwei Gao, Yujuan Xie, Ziwei Deng, Minghua Sun, Shiwei Zheng, Wei-Mou Bu, Dongbo BMC Bioinformatics Methodology Article BACKGROUND: Accurate prediction of inter-residue contacts of a protein is important to calculating its tertiary structure. Analysis of co-evolutionary events among residues has been proved effective in inferring inter-residue contacts. The Markov random field (MRF) technique, although being widely used for contact prediction, suffers from the following dilemma: the actual likelihood function of MRF is accurate but time-consuming to calculate; in contrast, approximations to the actual likelihood, say pseudo-likelihood, are efficient to calculate but inaccurate. Thus, how to achieve both accuracy and efficiency simultaneously remains a challenge. RESULTS: In this study, we present such an approach (called clmDCA) for contact prediction. Unlike plmDCA using pseudo-likelihood, i.e., the product of conditional probability of individual residues, our approach uses composite-likelihood, i.e., the product of conditional probability of all residue pairs. Composite likelihood has been theoretically proved as a better approximation to the actual likelihood function than pseudo-likelihood. Meanwhile, composite likelihood is still efficient to maximize, thus ensuring the efficiency of clmDCA. We present comprehensive experiments on popular benchmark datasets, including PSICOV dataset and CASP-11 dataset, to show that: i) clmDCA alone outperforms the existing MRF-based approaches in prediction accuracy. ii) When equipped with deep learning technique for refinement, the prediction accuracy of clmDCA was further significantly improved, suggesting the suitability of clmDCA for subsequent refinement procedure. We further present a successful application of the predicted contacts to accurately build tertiary structures for proteins in the PSICOV dataset. CONCLUSIONS: Composite likelihood maximization algorithm can efficiently estimate the parameters of Markov Random Fields and can improve the prediction accuracy of protein inter-residue contacts. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3051-7) contains supplementary material, which is available to authorized users. BioMed Central 2019-10-29 /pmc/articles/PMC6821021/ /pubmed/31664895 http://dx.doi.org/10.1186/s12859-019-3051-7 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Zhang, Haicang
Zhang, Qi
Ju, Fusong
Zhu, Jianwei
Gao, Yujuan
Xie, Ziwei
Deng, Minghua
Sun, Shiwei
Zheng, Wei-Mou
Bu, Dongbo
Predicting protein inter-residue contacts using composite likelihood maximization and deep learning
title Predicting protein inter-residue contacts using composite likelihood maximization and deep learning
title_full Predicting protein inter-residue contacts using composite likelihood maximization and deep learning
title_fullStr Predicting protein inter-residue contacts using composite likelihood maximization and deep learning
title_full_unstemmed Predicting protein inter-residue contacts using composite likelihood maximization and deep learning
title_short Predicting protein inter-residue contacts using composite likelihood maximization and deep learning
title_sort predicting protein inter-residue contacts using composite likelihood maximization and deep learning
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6821021/
https://www.ncbi.nlm.nih.gov/pubmed/31664895
http://dx.doi.org/10.1186/s12859-019-3051-7
work_keys_str_mv AT zhanghaicang predictingproteininterresiduecontactsusingcompositelikelihoodmaximizationanddeeplearning
AT zhangqi predictingproteininterresiduecontactsusingcompositelikelihoodmaximizationanddeeplearning
AT jufusong predictingproteininterresiduecontactsusingcompositelikelihoodmaximizationanddeeplearning
AT zhujianwei predictingproteininterresiduecontactsusingcompositelikelihoodmaximizationanddeeplearning
AT gaoyujuan predictingproteininterresiduecontactsusingcompositelikelihoodmaximizationanddeeplearning
AT xieziwei predictingproteininterresiduecontactsusingcompositelikelihoodmaximizationanddeeplearning
AT dengminghua predictingproteininterresiduecontactsusingcompositelikelihoodmaximizationanddeeplearning
AT sunshiwei predictingproteininterresiduecontactsusingcompositelikelihoodmaximizationanddeeplearning
AT zhengweimou predictingproteininterresiduecontactsusingcompositelikelihoodmaximizationanddeeplearning
AT budongbo predictingproteininterresiduecontactsusingcompositelikelihoodmaximizationanddeeplearning