Cargando…

COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator

BACKGROUND: The post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approache...

Descripción completa

Detalles Bibliográficos
Autores principales: Rawi, Reda, Mall, Raghvendra, Kunji, Khalid, El Anbari, Mohammed, Aupetit, Michael, Ullah, Ehsan, Bensmail, Halima
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5159955/
https://www.ncbi.nlm.nih.gov/pubmed/27978812
http://dx.doi.org/10.1186/s12859-016-1400-3
_version_ 1782481849306578944
author Rawi, Reda
Mall, Raghvendra
Kunji, Khalid
El Anbari, Mohammed
Aupetit, Michael
Ullah, Ehsan
Bensmail, Halima
author_facet Rawi, Reda
Mall, Raghvendra
Kunji, Khalid
El Anbari, Mohammed
Aupetit, Michael
Ullah, Ehsan
Bensmail, Halima
author_sort Rawi, Reda
collection PubMed
description BACKGROUND: The post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed to motivate further development and progress in the field. We present a new contact detecting method, COUSCOus, by combining the best shrinkage approach, the empirical Bayes covariance estimator and GLasso. RESULTS: Using the original PSICOV benchmark dataset, COUSCOus achieves mean accuracies of 0.74, 0.62 and 0.55 for the top L/10 predicted long, medium and short range contacts, respectively. In addition, COUSCOus attains mean areas under the precision-recall curves of 0.25, 0.29 and 0.30 for long, medium and short contacts and outperforms PSICOV. We also observed that COUSCOus outperforms PSICOV w.r.t. Matthew’s correlation coefficient criterion on full list of residue contacts. Furthermore, COUSCOus achieves on average 10% more gain in prediction accuracy compared to PSICOV on an independent test set composed of CASP11 protein targets. Finally, we showed that when using a simple random forest meta-classifier, by combining contact detecting techniques and sequence derived features, PSICOV predictions should be replaced by the more accurate COUSCOus predictions. CONCLUSION: We conclude that the consideration of superior covariance shrinkage approaches will boost several research fields that apply the GLasso procedure, amongst the presented one of residue-residue contact prediction as well as fields such as gene network reconstruction. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1400-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5159955
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51599552016-12-23 COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator Rawi, Reda Mall, Raghvendra Kunji, Khalid El Anbari, Mohammed Aupetit, Michael Ullah, Ehsan Bensmail, Halima BMC Bioinformatics Research Article BACKGROUND: The post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed to motivate further development and progress in the field. We present a new contact detecting method, COUSCOus, by combining the best shrinkage approach, the empirical Bayes covariance estimator and GLasso. RESULTS: Using the original PSICOV benchmark dataset, COUSCOus achieves mean accuracies of 0.74, 0.62 and 0.55 for the top L/10 predicted long, medium and short range contacts, respectively. In addition, COUSCOus attains mean areas under the precision-recall curves of 0.25, 0.29 and 0.30 for long, medium and short contacts and outperforms PSICOV. We also observed that COUSCOus outperforms PSICOV w.r.t. Matthew’s correlation coefficient criterion on full list of residue contacts. Furthermore, COUSCOus achieves on average 10% more gain in prediction accuracy compared to PSICOV on an independent test set composed of CASP11 protein targets. Finally, we showed that when using a simple random forest meta-classifier, by combining contact detecting techniques and sequence derived features, PSICOV predictions should be replaced by the more accurate COUSCOus predictions. CONCLUSION: We conclude that the consideration of superior covariance shrinkage approaches will boost several research fields that apply the GLasso procedure, amongst the presented one of residue-residue contact prediction as well as fields such as gene network reconstruction. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1400-3) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-15 /pmc/articles/PMC5159955/ /pubmed/27978812 http://dx.doi.org/10.1186/s12859-016-1400-3 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Rawi, Reda
Mall, Raghvendra
Kunji, Khalid
El Anbari, Mohammed
Aupetit, Michael
Ullah, Ehsan
Bensmail, Halima
COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator
title COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator
title_full COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator
title_fullStr COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator
title_full_unstemmed COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator
title_short COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator
title_sort couscous: improved protein contact prediction using an empirical bayes covariance estimator
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5159955/
https://www.ncbi.nlm.nih.gov/pubmed/27978812
http://dx.doi.org/10.1186/s12859-016-1400-3
work_keys_str_mv AT rawireda couscousimprovedproteincontactpredictionusinganempiricalbayescovarianceestimator
AT mallraghvendra couscousimprovedproteincontactpredictionusinganempiricalbayescovarianceestimator
AT kunjikhalid couscousimprovedproteincontactpredictionusinganempiricalbayescovarianceestimator
AT elanbarimohammed couscousimprovedproteincontactpredictionusinganempiricalbayescovarianceestimator
AT aupetitmichael couscousimprovedproteincontactpredictionusinganempiricalbayescovarianceestimator
AT ullahehsan couscousimprovedproteincontactpredictionusinganempiricalbayescovarianceestimator
AT bensmailhalima couscousimprovedproteincontactpredictionusinganempiricalbayescovarianceestimator