Cargando…

Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments

BACKGROUND: Several methods are available for the detection of covarying positions from a multiple sequence alignment (MSA). If the MSA contains a large number of sequences, information about the proximities between residues derived from covariation maps can be sufficient to predict a protein fold....

Descripción completa

Detalles Bibliográficos
Autores principales: Clark, Greg W, Ackerman, Sharon H, Tillier, Elisabeth R, Gatti, Domenico L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046016/
https://www.ncbi.nlm.nih.gov/pubmed/24886131
http://dx.doi.org/10.1186/1471-2105-15-157
_version_ 1782319433506619392
author Clark, Greg W
Ackerman, Sharon H
Tillier, Elisabeth R
Gatti, Domenico L
author_facet Clark, Greg W
Ackerman, Sharon H
Tillier, Elisabeth R
Gatti, Domenico L
author_sort Clark, Greg W
collection PubMed
description BACKGROUND: Several methods are available for the detection of covarying positions from a multiple sequence alignment (MSA). If the MSA contains a large number of sequences, information about the proximities between residues derived from covariation maps can be sufficient to predict a protein fold. However, in many cases the structure is already known, and information on the covarying positions can be valuable to understand the protein mechanism and dynamic properties. RESULTS: In this study we have sought to determine whether a multivariate (multidimensional) extension of traditional mutual information (MI) can be an additional tool to study covariation. The performance of two multidimensional MI (mdMI) methods, designed to remove the effect of ternary/quaternary interdependencies, was tested with a set of 9 MSAs each containing <400 sequences, and was shown to be comparable to that of the newest methods based on maximum entropy/pseudolikelyhood statistical models of protein sequences. However, while all the methods tested detected a similar number of covarying pairs among the residues separated by < 8 Å in the reference X-ray structures, there was on average less than 65% overlap between the top scoring pairs detected by methods that are based on different principles. CONCLUSIONS: Given the large variety of structure and evolutionary history of different proteins it is possible that a single best method to detect covariation in all proteins does not exist, and that for each protein family the best information can be derived by merging/comparing results obtained with different methods. This approach may be particularly valuable in those cases in which the size of the MSA is small or the quality of the alignment is low, leading to significant differences in the pairs detected by different methods.
format Online
Article
Text
id pubmed-4046016
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40460162014-06-20 Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments Clark, Greg W Ackerman, Sharon H Tillier, Elisabeth R Gatti, Domenico L BMC Bioinformatics Methodology Article BACKGROUND: Several methods are available for the detection of covarying positions from a multiple sequence alignment (MSA). If the MSA contains a large number of sequences, information about the proximities between residues derived from covariation maps can be sufficient to predict a protein fold. However, in many cases the structure is already known, and information on the covarying positions can be valuable to understand the protein mechanism and dynamic properties. RESULTS: In this study we have sought to determine whether a multivariate (multidimensional) extension of traditional mutual information (MI) can be an additional tool to study covariation. The performance of two multidimensional MI (mdMI) methods, designed to remove the effect of ternary/quaternary interdependencies, was tested with a set of 9 MSAs each containing <400 sequences, and was shown to be comparable to that of the newest methods based on maximum entropy/pseudolikelyhood statistical models of protein sequences. However, while all the methods tested detected a similar number of covarying pairs among the residues separated by < 8 Å in the reference X-ray structures, there was on average less than 65% overlap between the top scoring pairs detected by methods that are based on different principles. CONCLUSIONS: Given the large variety of structure and evolutionary history of different proteins it is possible that a single best method to detect covariation in all proteins does not exist, and that for each protein family the best information can be derived by merging/comparing results obtained with different methods. This approach may be particularly valuable in those cases in which the size of the MSA is small or the quality of the alignment is low, leading to significant differences in the pairs detected by different methods. BioMed Central 2014-05-22 /pmc/articles/PMC4046016/ /pubmed/24886131 http://dx.doi.org/10.1186/1471-2105-15-157 Text en Copyright © 2014 Clark et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Clark, Greg W
Ackerman, Sharon H
Tillier, Elisabeth R
Gatti, Domenico L
Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments
title Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments
title_full Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments
title_fullStr Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments
title_full_unstemmed Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments
title_short Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments
title_sort multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046016/
https://www.ncbi.nlm.nih.gov/pubmed/24886131
http://dx.doi.org/10.1186/1471-2105-15-157
work_keys_str_mv AT clarkgregw multidimensionalmutualinformationmethodsfortheanalysisofcovariationinmultiplesequencealignments
AT ackermansharonh multidimensionalmutualinformationmethodsfortheanalysisofcovariationinmultiplesequencealignments
AT tillierelisabethr multidimensionalmutualinformationmethodsfortheanalysisofcovariationinmultiplesequencealignments
AT gattidomenicol multidimensionalmutualinformationmethodsfortheanalysisofcovariationinmultiplesequencealignments