Cargando…

Bios2mds: an R package for comparing orthologous protein families by metric multidimensional scaling

BACKGROUND: The distance matrix computed from multiple alignments of homologous sequences is widely used by distance-based phylogenetic methods to provide information on the evolution of protein families. This matrix can also be visualized in a low dimensional space by metric multidimensional scalin...

Descripción completa

Detalles Bibliográficos
Autores principales: Pelé, Julien, Bécu, Jean-Michel, Abdi, Hervé, Chabbert, Marie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3403911/
https://www.ncbi.nlm.nih.gov/pubmed/22702410
http://dx.doi.org/10.1186/1471-2105-13-133
_version_ 1782238943598608384
author Pelé, Julien
Bécu, Jean-Michel
Abdi, Hervé
Chabbert, Marie
author_facet Pelé, Julien
Bécu, Jean-Michel
Abdi, Hervé
Chabbert, Marie
author_sort Pelé, Julien
collection PubMed
description BACKGROUND: The distance matrix computed from multiple alignments of homologous sequences is widely used by distance-based phylogenetic methods to provide information on the evolution of protein families. This matrix can also be visualized in a low dimensional space by metric multidimensional scaling (MDS). Applied to protein families, MDS provides information complementary to the information derived from tree-based methods. Moreover, MDS gives a unique opportunity to compare orthologous sequence sets because it can add supplementary elements to a reference space. RESULTS: The R package bios2mds (from BIOlogical Sequences to MultiDimensional Scaling) has been designed to analyze multiple sequence alignments by MDS. Bios2mds starts with a sequence alignment, builds a matrix of distances between the aligned sequences, and represents this matrix by MDS to visualize a sequence space. This package also offers the possibility of performing K-means clustering in the MDS derived sequence space. Most importantly, bios2mds includes a function that projects supplementary elements (a.k.a. “out of sample” elements) onto the space defined by reference or “active” elements. Orthologous sequence sets can thus be compared in a straightforward way. The data analysis and visualization tools have been specifically designed for an easy monitoring of the evolutionary drift of protein sub-families. CONCLUSIONS: The bios2mds package provides the tools for a complete integrated pipeline aimed at the MDS analysis of multiple sets of orthologous sequences in the R statistical environment. In addition, as the analysis can be carried out from user provided matrices, the projection function can be widely used on any kind of data.
format Online
Article
Text
id pubmed-3403911
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34039112012-07-25 Bios2mds: an R package for comparing orthologous protein families by metric multidimensional scaling Pelé, Julien Bécu, Jean-Michel Abdi, Hervé Chabbert, Marie BMC Bioinformatics Software BACKGROUND: The distance matrix computed from multiple alignments of homologous sequences is widely used by distance-based phylogenetic methods to provide information on the evolution of protein families. This matrix can also be visualized in a low dimensional space by metric multidimensional scaling (MDS). Applied to protein families, MDS provides information complementary to the information derived from tree-based methods. Moreover, MDS gives a unique opportunity to compare orthologous sequence sets because it can add supplementary elements to a reference space. RESULTS: The R package bios2mds (from BIOlogical Sequences to MultiDimensional Scaling) has been designed to analyze multiple sequence alignments by MDS. Bios2mds starts with a sequence alignment, builds a matrix of distances between the aligned sequences, and represents this matrix by MDS to visualize a sequence space. This package also offers the possibility of performing K-means clustering in the MDS derived sequence space. Most importantly, bios2mds includes a function that projects supplementary elements (a.k.a. “out of sample” elements) onto the space defined by reference or “active” elements. Orthologous sequence sets can thus be compared in a straightforward way. The data analysis and visualization tools have been specifically designed for an easy monitoring of the evolutionary drift of protein sub-families. CONCLUSIONS: The bios2mds package provides the tools for a complete integrated pipeline aimed at the MDS analysis of multiple sets of orthologous sequences in the R statistical environment. In addition, as the analysis can be carried out from user provided matrices, the projection function can be widely used on any kind of data. BioMed Central 2012-06-15 /pmc/articles/PMC3403911/ /pubmed/22702410 http://dx.doi.org/10.1186/1471-2105-13-133 Text en Copyright ©2012 Pelé et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Pelé, Julien
Bécu, Jean-Michel
Abdi, Hervé
Chabbert, Marie
Bios2mds: an R package for comparing orthologous protein families by metric multidimensional scaling
title Bios2mds: an R package for comparing orthologous protein families by metric multidimensional scaling
title_full Bios2mds: an R package for comparing orthologous protein families by metric multidimensional scaling
title_fullStr Bios2mds: an R package for comparing orthologous protein families by metric multidimensional scaling
title_full_unstemmed Bios2mds: an R package for comparing orthologous protein families by metric multidimensional scaling
title_short Bios2mds: an R package for comparing orthologous protein families by metric multidimensional scaling
title_sort bios2mds: an r package for comparing orthologous protein families by metric multidimensional scaling
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3403911/
https://www.ncbi.nlm.nih.gov/pubmed/22702410
http://dx.doi.org/10.1186/1471-2105-13-133
work_keys_str_mv AT pelejulien bios2mdsanrpackageforcomparingorthologousproteinfamiliesbymetricmultidimensionalscaling
AT becujeanmichel bios2mdsanrpackageforcomparingorthologousproteinfamiliesbymetricmultidimensionalscaling
AT abdiherve bios2mdsanrpackageforcomparingorthologousproteinfamiliesbymetricmultidimensionalscaling
AT chabbertmarie bios2mdsanrpackageforcomparingorthologousproteinfamiliesbymetricmultidimensionalscaling