Cargando…

Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution

Motivation: With rapid accumulation of sequence data on several species, extracting rational and systematic information from multiple sequence alignments (MSAs) is becoming increasingly important. Currently, there is a plethora of computational methods for investigating coupled evolutionary changes...

Descripción completa

Detalles Bibliográficos
Autores principales: Mao, Wenzhi, Kaya, Cihan, Dutta, Anindita, Horovitz, Amnon, Bahar, Ivet
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481699/
https://www.ncbi.nlm.nih.gov/pubmed/25697822
http://dx.doi.org/10.1093/bioinformatics/btv103
_version_ 1782378309840011264
author Mao, Wenzhi
Kaya, Cihan
Dutta, Anindita
Horovitz, Amnon
Bahar, Ivet
author_facet Mao, Wenzhi
Kaya, Cihan
Dutta, Anindita
Horovitz, Amnon
Bahar, Ivet
author_sort Mao, Wenzhi
collection PubMed
description Motivation: With rapid accumulation of sequence data on several species, extracting rational and systematic information from multiple sequence alignments (MSAs) is becoming increasingly important. Currently, there is a plethora of computational methods for investigating coupled evolutionary changes in pairs of positions along the amino acid sequence, and making inferences on structure and function. Yet, the significance of coevolution signals remains to be established. Also, a large number of false positives (FPs) arise from insufficient MSA size, phylogenetic background and indirect couplings. Results: Here, a set of 16 pairs of non-interacting proteins is thoroughly examined to assess the effectiveness and limitations of different methods. The analysis shows that recent computationally expensive methods designed to remove biases from indirect couplings outperform others in detecting tertiary structural contacts as well as eliminating intermolecular FPs; whereas traditional methods such as mutual information benefit from refinements such as shuffling, while being highly efficient. Computations repeated with 2,330 pairs of protein families from the Negatome database corroborated these results. Finally, using a training dataset of 162 families of proteins, we propose a combined method that outperforms existing individual methods. Overall, the study provides simple guidelines towards the choice of suitable methods and strategies based on available MSA size and computing resources. Availability and implementation: Software is freely available through the Evol component of ProDy API. Contact: bahar@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4481699
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-44816992015-06-30 Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution Mao, Wenzhi Kaya, Cihan Dutta, Anindita Horovitz, Amnon Bahar, Ivet Bioinformatics Original Papers Motivation: With rapid accumulation of sequence data on several species, extracting rational and systematic information from multiple sequence alignments (MSAs) is becoming increasingly important. Currently, there is a plethora of computational methods for investigating coupled evolutionary changes in pairs of positions along the amino acid sequence, and making inferences on structure and function. Yet, the significance of coevolution signals remains to be established. Also, a large number of false positives (FPs) arise from insufficient MSA size, phylogenetic background and indirect couplings. Results: Here, a set of 16 pairs of non-interacting proteins is thoroughly examined to assess the effectiveness and limitations of different methods. The analysis shows that recent computationally expensive methods designed to remove biases from indirect couplings outperform others in detecting tertiary structural contacts as well as eliminating intermolecular FPs; whereas traditional methods such as mutual information benefit from refinements such as shuffling, while being highly efficient. Computations repeated with 2,330 pairs of protein families from the Negatome database corroborated these results. Finally, using a training dataset of 162 families of proteins, we propose a combined method that outperforms existing individual methods. Overall, the study provides simple guidelines towards the choice of suitable methods and strategies based on available MSA size and computing resources. Availability and implementation: Software is freely available through the Evol component of ProDy API. Contact: bahar@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2015-06-15 2015-02-19 /pmc/articles/PMC4481699/ /pubmed/25697822 http://dx.doi.org/10.1093/bioinformatics/btv103 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Mao, Wenzhi
Kaya, Cihan
Dutta, Anindita
Horovitz, Amnon
Bahar, Ivet
Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution
title Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution
title_full Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution
title_fullStr Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution
title_full_unstemmed Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution
title_short Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution
title_sort comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481699/
https://www.ncbi.nlm.nih.gov/pubmed/25697822
http://dx.doi.org/10.1093/bioinformatics/btv103
work_keys_str_mv AT maowenzhi comparativestudyoftheeffectivenessandlimitationsofcurrentmethodsfordetectingsequencecoevolution
AT kayacihan comparativestudyoftheeffectivenessandlimitationsofcurrentmethodsfordetectingsequencecoevolution
AT duttaanindita comparativestudyoftheeffectivenessandlimitationsofcurrentmethodsfordetectingsequencecoevolution
AT horovitzamnon comparativestudyoftheeffectivenessandlimitationsofcurrentmethodsfordetectingsequencecoevolution
AT baharivet comparativestudyoftheeffectivenessandlimitationsofcurrentmethodsfordetectingsequencecoevolution