Cargando…

Detecting species-site dependencies in large multiple sequence alignments

Multiple sequence alignments (MSAs) are one of the most important sources of information in sequence analysis. Many methods have been proposed to detect, extract and visualize their most significant properties. To the same extent that site-specific methods like sequence logos successfully visualize...

Descripción completa

Detalles Bibliográficos
Autores principales: Schwarz, Roland, Seibel, Philipp N., Rahmann, Sven, Schoen, Christoph, Huenerberg, Mirja, Müller-Reible, Clemens, Dandekar, Thomas, Karchin, Rachel, Schultz, Jörg, Müller, Tobias
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2764451/
https://www.ncbi.nlm.nih.gov/pubmed/19661281
http://dx.doi.org/10.1093/nar/gkp634
_version_ 1782173086196432896
author Schwarz, Roland
Seibel, Philipp N.
Rahmann, Sven
Schoen, Christoph
Huenerberg, Mirja
Müller-Reible, Clemens
Dandekar, Thomas
Karchin, Rachel
Schultz, Jörg
Müller, Tobias
author_facet Schwarz, Roland
Seibel, Philipp N.
Rahmann, Sven
Schoen, Christoph
Huenerberg, Mirja
Müller-Reible, Clemens
Dandekar, Thomas
Karchin, Rachel
Schultz, Jörg
Müller, Tobias
author_sort Schwarz, Roland
collection PubMed
description Multiple sequence alignments (MSAs) are one of the most important sources of information in sequence analysis. Many methods have been proposed to detect, extract and visualize their most significant properties. To the same extent that site-specific methods like sequence logos successfully visualize site conservations and sequence-based methods like clustering approaches detect relationships between sequences, both types of methods fail at revealing informational elements of MSAs at the level of sequence–site interactions, i.e. finding clusters of sequences and sites responsible for their clustering, which together account for a high fraction of the overall information of the MSA. To fill this gap, we present here a method that combines the Fisher score-based embedding of sequences from a profile hidden Markov model (pHMM) with correspondence analysis. This method is capable of detecting and visualizing group-specific or conflicting signals in an MSA and allows for a detailed explorative investigation of alignments of any size tractable by pHMMs. Applications of our methods are exemplified on an alignment of the Neisseria surface antigen LP2086, where it is used to detect sites of recombinatory horizontal gene transfer and on the vitamin K epoxide reductase family to distinguish between evolutionary and functional signals.
format Text
id pubmed-2764451
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-27644512009-10-20 Detecting species-site dependencies in large multiple sequence alignments Schwarz, Roland Seibel, Philipp N. Rahmann, Sven Schoen, Christoph Huenerberg, Mirja Müller-Reible, Clemens Dandekar, Thomas Karchin, Rachel Schultz, Jörg Müller, Tobias Nucleic Acids Res Computational Biology Multiple sequence alignments (MSAs) are one of the most important sources of information in sequence analysis. Many methods have been proposed to detect, extract and visualize their most significant properties. To the same extent that site-specific methods like sequence logos successfully visualize site conservations and sequence-based methods like clustering approaches detect relationships between sequences, both types of methods fail at revealing informational elements of MSAs at the level of sequence–site interactions, i.e. finding clusters of sequences and sites responsible for their clustering, which together account for a high fraction of the overall information of the MSA. To fill this gap, we present here a method that combines the Fisher score-based embedding of sequences from a profile hidden Markov model (pHMM) with correspondence analysis. This method is capable of detecting and visualizing group-specific or conflicting signals in an MSA and allows for a detailed explorative investigation of alignments of any size tractable by pHMMs. Applications of our methods are exemplified on an alignment of the Neisseria surface antigen LP2086, where it is used to detect sites of recombinatory horizontal gene transfer and on the vitamin K epoxide reductase family to distinguish between evolutionary and functional signals. Oxford University Press 2009-10 2009-08-06 /pmc/articles/PMC2764451/ /pubmed/19661281 http://dx.doi.org/10.1093/nar/gkp634 Text en © 2009 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Schwarz, Roland
Seibel, Philipp N.
Rahmann, Sven
Schoen, Christoph
Huenerberg, Mirja
Müller-Reible, Clemens
Dandekar, Thomas
Karchin, Rachel
Schultz, Jörg
Müller, Tobias
Detecting species-site dependencies in large multiple sequence alignments
title Detecting species-site dependencies in large multiple sequence alignments
title_full Detecting species-site dependencies in large multiple sequence alignments
title_fullStr Detecting species-site dependencies in large multiple sequence alignments
title_full_unstemmed Detecting species-site dependencies in large multiple sequence alignments
title_short Detecting species-site dependencies in large multiple sequence alignments
title_sort detecting species-site dependencies in large multiple sequence alignments
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2764451/
https://www.ncbi.nlm.nih.gov/pubmed/19661281
http://dx.doi.org/10.1093/nar/gkp634
work_keys_str_mv AT schwarzroland detectingspeciessitedependenciesinlargemultiplesequencealignments
AT seibelphilippn detectingspeciessitedependenciesinlargemultiplesequencealignments
AT rahmannsven detectingspeciessitedependenciesinlargemultiplesequencealignments
AT schoenchristoph detectingspeciessitedependenciesinlargemultiplesequencealignments
AT huenerbergmirja detectingspeciessitedependenciesinlargemultiplesequencealignments
AT mullerreibleclemens detectingspeciessitedependenciesinlargemultiplesequencealignments
AT dandekarthomas detectingspeciessitedependenciesinlargemultiplesequencealignments
AT karchinrachel detectingspeciessitedependenciesinlargemultiplesequencealignments
AT schultzjorg detectingspeciessitedependenciesinlargemultiplesequencealignments
AT mullertobias detectingspeciessitedependenciesinlargemultiplesequencealignments