Cargando…

Universal sequence map (USM) of arbitrary discrete sequences

BACKGROUND: For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties...

Descripción completa

Detalles Bibliográficos
Autores principales: Almeida, Jonas S, Vinga, Susana
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2002
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC90187/
https://www.ncbi.nlm.nih.gov/pubmed/11895567
http://dx.doi.org/10.1186/1471-2105-3-6
_version_ 1782120191117754368
author Almeida, Jonas S
Vinga, Susana
author_facet Almeida, Jonas S
Vinga, Susana
author_sort Almeida, Jonas S
collection PubMed
description BACKGROUND: For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. RESULTS: We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM), is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR). The latter enables the representation of 4 unit type sequences (like DNA) as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. CONCLUSIONS: USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules.
format Text
id pubmed-90187
institution National Center for Biotechnology Information
language English
publishDate 2002
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-901872002-03-19 Universal sequence map (USM) of arbitrary discrete sequences Almeida, Jonas S Vinga, Susana BMC Bioinformatics Methodology article BACKGROUND: For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. RESULTS: We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM), is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR). The latter enables the representation of 4 unit type sequences (like DNA) as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. CONCLUSIONS: USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules. BioMed Central 2002-02-05 /pmc/articles/PMC90187/ /pubmed/11895567 http://dx.doi.org/10.1186/1471-2105-3-6 Text en Copyright ©2002 Almeida and Vinga; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Methodology article
Almeida, Jonas S
Vinga, Susana
Universal sequence map (USM) of arbitrary discrete sequences
title Universal sequence map (USM) of arbitrary discrete sequences
title_full Universal sequence map (USM) of arbitrary discrete sequences
title_fullStr Universal sequence map (USM) of arbitrary discrete sequences
title_full_unstemmed Universal sequence map (USM) of arbitrary discrete sequences
title_short Universal sequence map (USM) of arbitrary discrete sequences
title_sort universal sequence map (usm) of arbitrary discrete sequences
topic Methodology article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC90187/
https://www.ncbi.nlm.nih.gov/pubmed/11895567
http://dx.doi.org/10.1186/1471-2105-3-6
work_keys_str_mv AT almeidajonass universalsequencemapusmofarbitrarydiscretesequences
AT vingasusana universalsequencemapusmofarbitrarydiscretesequences