Cargando…

Comparative analysis of long DNA sequences by per element information content using different contexts

BACKGROUND: Features of a DNA sequence can be found by compressing the sequence under a suitable model; good compression implies low information content. Good DNA compression models consider repetition, differences between repeats, and base distributions. From a linear DNA sequence, a compression mo...

Descripción completa

Detalles Bibliográficos
Autores principales: Dix, Trevor I, Powell, David R, Allison, Lloyd, Bernal, Julie, Jaeger, Samira, Stern, Linda
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1892068/
https://www.ncbi.nlm.nih.gov/pubmed/17493248
http://dx.doi.org/10.1186/1471-2105-8-S2-S10
_version_ 1782133819678130176
author Dix, Trevor I
Powell, David R
Allison, Lloyd
Bernal, Julie
Jaeger, Samira
Stern, Linda
author_facet Dix, Trevor I
Powell, David R
Allison, Lloyd
Bernal, Julie
Jaeger, Samira
Stern, Linda
author_sort Dix, Trevor I
collection PubMed
description BACKGROUND: Features of a DNA sequence can be found by compressing the sequence under a suitable model; good compression implies low information content. Good DNA compression models consider repetition, differences between repeats, and base distributions. From a linear DNA sequence, a compression model can produce a linear information sequence. Linear space complexity is important when exploring long DNA sequences of the order of millions of bases. Compressing a sequence in isolation will include information on self-repetition. Whereas compressing a sequence Y in the context of another X can find what new information X gives about Y. This paper presents a methodology for performing comparative analysis to find features exposed by such models. RESULTS: We apply such a model to find features across chromosomes of Cyanidioschyzon merolae. We present a tool that provides useful linear transformations to investigate and save new sequences. Various examples illustrate the methodology, finding features for sequences alone and in different contexts. We also show how to highlight all sets of self-repetition features, in this case within Plasmodium falciparum chromosome 2. CONCLUSION: The methodology finds features that are significant and that biologists confirm. The exploration of long information sequences in linear time and space is fast and the saved results are self documenting.
format Text
id pubmed-1892068
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18920682007-06-15 Comparative analysis of long DNA sequences by per element information content using different contexts Dix, Trevor I Powell, David R Allison, Lloyd Bernal, Julie Jaeger, Samira Stern, Linda BMC Bioinformatics Research BACKGROUND: Features of a DNA sequence can be found by compressing the sequence under a suitable model; good compression implies low information content. Good DNA compression models consider repetition, differences between repeats, and base distributions. From a linear DNA sequence, a compression model can produce a linear information sequence. Linear space complexity is important when exploring long DNA sequences of the order of millions of bases. Compressing a sequence in isolation will include information on self-repetition. Whereas compressing a sequence Y in the context of another X can find what new information X gives about Y. This paper presents a methodology for performing comparative analysis to find features exposed by such models. RESULTS: We apply such a model to find features across chromosomes of Cyanidioschyzon merolae. We present a tool that provides useful linear transformations to investigate and save new sequences. Various examples illustrate the methodology, finding features for sequences alone and in different contexts. We also show how to highlight all sets of self-repetition features, in this case within Plasmodium falciparum chromosome 2. CONCLUSION: The methodology finds features that are significant and that biologists confirm. The exploration of long information sequences in linear time and space is fast and the saved results are self documenting. BioMed Central 2007-05-03 /pmc/articles/PMC1892068/ /pubmed/17493248 http://dx.doi.org/10.1186/1471-2105-8-S2-S10 Text en Copyright © 2007 Dix et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Dix, Trevor I
Powell, David R
Allison, Lloyd
Bernal, Julie
Jaeger, Samira
Stern, Linda
Comparative analysis of long DNA sequences by per element information content using different contexts
title Comparative analysis of long DNA sequences by per element information content using different contexts
title_full Comparative analysis of long DNA sequences by per element information content using different contexts
title_fullStr Comparative analysis of long DNA sequences by per element information content using different contexts
title_full_unstemmed Comparative analysis of long DNA sequences by per element information content using different contexts
title_short Comparative analysis of long DNA sequences by per element information content using different contexts
title_sort comparative analysis of long dna sequences by per element information content using different contexts
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1892068/
https://www.ncbi.nlm.nih.gov/pubmed/17493248
http://dx.doi.org/10.1186/1471-2105-8-S2-S10
work_keys_str_mv AT dixtrevori comparativeanalysisoflongdnasequencesbyperelementinformationcontentusingdifferentcontexts
AT powelldavidr comparativeanalysisoflongdnasequencesbyperelementinformationcontentusingdifferentcontexts
AT allisonlloyd comparativeanalysisoflongdnasequencesbyperelementinformationcontentusingdifferentcontexts
AT bernaljulie comparativeanalysisoflongdnasequencesbyperelementinformationcontentusingdifferentcontexts
AT jaegersamira comparativeanalysisoflongdnasequencesbyperelementinformationcontentusingdifferentcontexts
AT sternlinda comparativeanalysisoflongdnasequencesbyperelementinformationcontentusingdifferentcontexts