Cargando…
Detecting species-site dependencies in large multiple sequence alignments
Multiple sequence alignments (MSAs) are one of the most important sources of information in sequence analysis. Many methods have been proposed to detect, extract and visualize their most significant properties. To the same extent that site-specific methods like sequence logos successfully visualize...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2764451/ https://www.ncbi.nlm.nih.gov/pubmed/19661281 http://dx.doi.org/10.1093/nar/gkp634 |
_version_ | 1782173086196432896 |
---|---|
author | Schwarz, Roland Seibel, Philipp N. Rahmann, Sven Schoen, Christoph Huenerberg, Mirja Müller-Reible, Clemens Dandekar, Thomas Karchin, Rachel Schultz, Jörg Müller, Tobias |
author_facet | Schwarz, Roland Seibel, Philipp N. Rahmann, Sven Schoen, Christoph Huenerberg, Mirja Müller-Reible, Clemens Dandekar, Thomas Karchin, Rachel Schultz, Jörg Müller, Tobias |
author_sort | Schwarz, Roland |
collection | PubMed |
description | Multiple sequence alignments (MSAs) are one of the most important sources of information in sequence analysis. Many methods have been proposed to detect, extract and visualize their most significant properties. To the same extent that site-specific methods like sequence logos successfully visualize site conservations and sequence-based methods like clustering approaches detect relationships between sequences, both types of methods fail at revealing informational elements of MSAs at the level of sequence–site interactions, i.e. finding clusters of sequences and sites responsible for their clustering, which together account for a high fraction of the overall information of the MSA. To fill this gap, we present here a method that combines the Fisher score-based embedding of sequences from a profile hidden Markov model (pHMM) with correspondence analysis. This method is capable of detecting and visualizing group-specific or conflicting signals in an MSA and allows for a detailed explorative investigation of alignments of any size tractable by pHMMs. Applications of our methods are exemplified on an alignment of the Neisseria surface antigen LP2086, where it is used to detect sites of recombinatory horizontal gene transfer and on the vitamin K epoxide reductase family to distinguish between evolutionary and functional signals. |
format | Text |
id | pubmed-2764451 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-27644512009-10-20 Detecting species-site dependencies in large multiple sequence alignments Schwarz, Roland Seibel, Philipp N. Rahmann, Sven Schoen, Christoph Huenerberg, Mirja Müller-Reible, Clemens Dandekar, Thomas Karchin, Rachel Schultz, Jörg Müller, Tobias Nucleic Acids Res Computational Biology Multiple sequence alignments (MSAs) are one of the most important sources of information in sequence analysis. Many methods have been proposed to detect, extract and visualize their most significant properties. To the same extent that site-specific methods like sequence logos successfully visualize site conservations and sequence-based methods like clustering approaches detect relationships between sequences, both types of methods fail at revealing informational elements of MSAs at the level of sequence–site interactions, i.e. finding clusters of sequences and sites responsible for their clustering, which together account for a high fraction of the overall information of the MSA. To fill this gap, we present here a method that combines the Fisher score-based embedding of sequences from a profile hidden Markov model (pHMM) with correspondence analysis. This method is capable of detecting and visualizing group-specific or conflicting signals in an MSA and allows for a detailed explorative investigation of alignments of any size tractable by pHMMs. Applications of our methods are exemplified on an alignment of the Neisseria surface antigen LP2086, where it is used to detect sites of recombinatory horizontal gene transfer and on the vitamin K epoxide reductase family to distinguish between evolutionary and functional signals. Oxford University Press 2009-10 2009-08-06 /pmc/articles/PMC2764451/ /pubmed/19661281 http://dx.doi.org/10.1093/nar/gkp634 Text en © 2009 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Computational Biology Schwarz, Roland Seibel, Philipp N. Rahmann, Sven Schoen, Christoph Huenerberg, Mirja Müller-Reible, Clemens Dandekar, Thomas Karchin, Rachel Schultz, Jörg Müller, Tobias Detecting species-site dependencies in large multiple sequence alignments |
title | Detecting species-site dependencies in large multiple sequence alignments |
title_full | Detecting species-site dependencies in large multiple sequence alignments |
title_fullStr | Detecting species-site dependencies in large multiple sequence alignments |
title_full_unstemmed | Detecting species-site dependencies in large multiple sequence alignments |
title_short | Detecting species-site dependencies in large multiple sequence alignments |
title_sort | detecting species-site dependencies in large multiple sequence alignments |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2764451/ https://www.ncbi.nlm.nih.gov/pubmed/19661281 http://dx.doi.org/10.1093/nar/gkp634 |
work_keys_str_mv | AT schwarzroland detectingspeciessitedependenciesinlargemultiplesequencealignments AT seibelphilippn detectingspeciessitedependenciesinlargemultiplesequencealignments AT rahmannsven detectingspeciessitedependenciesinlargemultiplesequencealignments AT schoenchristoph detectingspeciessitedependenciesinlargemultiplesequencealignments AT huenerbergmirja detectingspeciessitedependenciesinlargemultiplesequencealignments AT mullerreibleclemens detectingspeciessitedependenciesinlargemultiplesequencealignments AT dandekarthomas detectingspeciessitedependenciesinlargemultiplesequencealignments AT karchinrachel detectingspeciessitedependenciesinlargemultiplesequencealignments AT schultzjorg detectingspeciessitedependenciesinlargemultiplesequencealignments AT mullertobias detectingspeciessitedependenciesinlargemultiplesequencealignments |