Cargando…

Mutual information and variants for protein domain-domain contact prediction

BACKGROUND: Predicting protein contacts solely based on sequence information remains a challenging problem, despite the huge amount of sequence data at our disposal. Mutual Information (MI), an information theory measure, has been extensively employed and modified to identify residues within a prote...

Descripción completa

Detalles Bibliográficos
Autores principales: Gomes, Mireille, Hamer, Rebecca, Reinert, Gesine, Deane, Charlotte M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3532072/
https://www.ncbi.nlm.nih.gov/pubmed/23244412
http://dx.doi.org/10.1186/1756-0500-5-472
_version_ 1782254242365440000
author Gomes, Mireille
Hamer, Rebecca
Reinert, Gesine
Deane, Charlotte M
author_facet Gomes, Mireille
Hamer, Rebecca
Reinert, Gesine
Deane, Charlotte M
author_sort Gomes, Mireille
collection PubMed
description BACKGROUND: Predicting protein contacts solely based on sequence information remains a challenging problem, despite the huge amount of sequence data at our disposal. Mutual Information (MI), an information theory measure, has been extensively employed and modified to identify residues within a protein (intra-protein) that are in contact. More recently MI and its variants have also been used in the prediction of contacts between proteins (inter-protein). METHODS: Here we assess the predictive power of MI and variants for domain-domain contact prediction. We test original MI and these variants, which are called MIp, MIc and ZNMI, on 40 domain-domain test cases containing 10,753 sequences. We also propose and evaluate two new versions of MI that consider triangles of residues and the physiochemical properties of the amino acids, respectively. RESULTS: We found that all versions of MI are skewed towards predicting surface residues. Since domain-domain contacts are on the surface of each domain, we considered only surface residues when attempting to predict contacts. Our analysis shows that MIc is the best current MI domain-domain contact predictor. At 20% recall MIc achieved a precision of 44.9% when only surface residues were considered. Our triangle and reduced alphabet variants of MI highlight the delicate trade-off between signal and noise in the use of MI for domain-domain contact prediction. We also examine a specific “successful” case study and demonstrate that here, when considering surface residues, even the most accurate domain-domain contact predictor, MIc, performs no better than random. CONCLUSIONS: All tested variants of MI are skewed towards predicting surface residues. When considering surface residues only, we find MIc to be the best current MI domain-domain contact predictor. Its performance, however, is not as good as a non-MI based contact predictor, i-Patch. Additionally, the intra-protein contact prediction capabilities of MIc outperform its domain-domain contact prediction abilities.
format Online
Article
Text
id pubmed-3532072
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35320722013-01-03 Mutual information and variants for protein domain-domain contact prediction Gomes, Mireille Hamer, Rebecca Reinert, Gesine Deane, Charlotte M BMC Res Notes Research Article BACKGROUND: Predicting protein contacts solely based on sequence information remains a challenging problem, despite the huge amount of sequence data at our disposal. Mutual Information (MI), an information theory measure, has been extensively employed and modified to identify residues within a protein (intra-protein) that are in contact. More recently MI and its variants have also been used in the prediction of contacts between proteins (inter-protein). METHODS: Here we assess the predictive power of MI and variants for domain-domain contact prediction. We test original MI and these variants, which are called MIp, MIc and ZNMI, on 40 domain-domain test cases containing 10,753 sequences. We also propose and evaluate two new versions of MI that consider triangles of residues and the physiochemical properties of the amino acids, respectively. RESULTS: We found that all versions of MI are skewed towards predicting surface residues. Since domain-domain contacts are on the surface of each domain, we considered only surface residues when attempting to predict contacts. Our analysis shows that MIc is the best current MI domain-domain contact predictor. At 20% recall MIc achieved a precision of 44.9% when only surface residues were considered. Our triangle and reduced alphabet variants of MI highlight the delicate trade-off between signal and noise in the use of MI for domain-domain contact prediction. We also examine a specific “successful” case study and demonstrate that here, when considering surface residues, even the most accurate domain-domain contact predictor, MIc, performs no better than random. CONCLUSIONS: All tested variants of MI are skewed towards predicting surface residues. When considering surface residues only, we find MIc to be the best current MI domain-domain contact predictor. Its performance, however, is not as good as a non-MI based contact predictor, i-Patch. Additionally, the intra-protein contact prediction capabilities of MIc outperform its domain-domain contact prediction abilities. BioMed Central 2012-08-31 /pmc/articles/PMC3532072/ /pubmed/23244412 http://dx.doi.org/10.1186/1756-0500-5-472 Text en Copyright ©2012 Gomes et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Gomes, Mireille
Hamer, Rebecca
Reinert, Gesine
Deane, Charlotte M
Mutual information and variants for protein domain-domain contact prediction
title Mutual information and variants for protein domain-domain contact prediction
title_full Mutual information and variants for protein domain-domain contact prediction
title_fullStr Mutual information and variants for protein domain-domain contact prediction
title_full_unstemmed Mutual information and variants for protein domain-domain contact prediction
title_short Mutual information and variants for protein domain-domain contact prediction
title_sort mutual information and variants for protein domain-domain contact prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3532072/
https://www.ncbi.nlm.nih.gov/pubmed/23244412
http://dx.doi.org/10.1186/1756-0500-5-472
work_keys_str_mv AT gomesmireille mutualinformationandvariantsforproteindomaindomaincontactprediction
AT hamerrebecca mutualinformationandvariantsforproteindomaindomaincontactprediction
AT reinertgesine mutualinformationandvariantsforproteindomaindomaincontactprediction
AT deanecharlottem mutualinformationandvariantsforproteindomaindomaincontactprediction