Cargando…

Computing distribution of scale independent motifs in biological sequences

The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic m...

Descripción completa

Detalles Bibliográficos
Autores principales: Almeida, Jonas S, Vinga, Susana
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1630425/
https://www.ncbi.nlm.nih.gov/pubmed/17049089
http://dx.doi.org/10.1186/1748-7188-1-18
_version_ 1782130624251822080
author Almeida, Jonas S
Vinga, Susana
author_facet Almeida, Jonas S
Vinga, Susana
author_sort Almeida, Jonas S
collection PubMed
description The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques.
format Text
id pubmed-1630425
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-16304252006-11-06 Computing distribution of scale independent motifs in biological sequences Almeida, Jonas S Vinga, Susana Algorithms Mol Biol Research The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques. BioMed Central 2006-10-18 /pmc/articles/PMC1630425/ /pubmed/17049089 http://dx.doi.org/10.1186/1748-7188-1-18 Text en Copyright © 2006 Almeida and Vinga; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Almeida, Jonas S
Vinga, Susana
Computing distribution of scale independent motifs in biological sequences
title Computing distribution of scale independent motifs in biological sequences
title_full Computing distribution of scale independent motifs in biological sequences
title_fullStr Computing distribution of scale independent motifs in biological sequences
title_full_unstemmed Computing distribution of scale independent motifs in biological sequences
title_short Computing distribution of scale independent motifs in biological sequences
title_sort computing distribution of scale independent motifs in biological sequences
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1630425/
https://www.ncbi.nlm.nih.gov/pubmed/17049089
http://dx.doi.org/10.1186/1748-7188-1-18
work_keys_str_mv AT almeidajonass computingdistributionofscaleindependentmotifsinbiologicalsequences
AT vingasusana computingdistributionofscaleindependentmotifsinbiologicalsequences