Cargando…

DNA word analysis based on the distribution of the distances between symmetric words

We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distributio...

Descripción completa

Detalles Bibliográficos
Autores principales: Tavares, Ana H. M. P., Pinho, Armando J., Silva, Raquel M., Rodrigues, João M. O. S., Bastos, Carlos A. C., Ferreira, Paulo J. S. G., Afreixo, Vera
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5428789/
https://www.ncbi.nlm.nih.gov/pubmed/28389642
http://dx.doi.org/10.1038/s41598-017-00646-2
_version_ 1783235901880532992
author Tavares, Ana H. M. P.
Pinho, Armando J.
Silva, Raquel M.
Rodrigues, João M. O. S.
Bastos, Carlos A. C.
Ferreira, Paulo J. S. G.
Afreixo, Vera
author_facet Tavares, Ana H. M. P.
Pinho, Armando J.
Silva, Raquel M.
Rodrigues, João M. O. S.
Bastos, Carlos A. C.
Ferreira, Paulo J. S. G.
Afreixo, Vera
author_sort Tavares, Ana H. M. P.
collection PubMed
description We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected.
format Online
Article
Text
id pubmed-5428789
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-54287892017-05-15 DNA word analysis based on the distribution of the distances between symmetric words Tavares, Ana H. M. P. Pinho, Armando J. Silva, Raquel M. Rodrigues, João M. O. S. Bastos, Carlos A. C. Ferreira, Paulo J. S. G. Afreixo, Vera Sci Rep Article We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected. Nature Publishing Group UK 2017-04-07 /pmc/articles/PMC5428789/ /pubmed/28389642 http://dx.doi.org/10.1038/s41598-017-00646-2 Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Tavares, Ana H. M. P.
Pinho, Armando J.
Silva, Raquel M.
Rodrigues, João M. O. S.
Bastos, Carlos A. C.
Ferreira, Paulo J. S. G.
Afreixo, Vera
DNA word analysis based on the distribution of the distances between symmetric words
title DNA word analysis based on the distribution of the distances between symmetric words
title_full DNA word analysis based on the distribution of the distances between symmetric words
title_fullStr DNA word analysis based on the distribution of the distances between symmetric words
title_full_unstemmed DNA word analysis based on the distribution of the distances between symmetric words
title_short DNA word analysis based on the distribution of the distances between symmetric words
title_sort dna word analysis based on the distribution of the distances between symmetric words
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5428789/
https://www.ncbi.nlm.nih.gov/pubmed/28389642
http://dx.doi.org/10.1038/s41598-017-00646-2
work_keys_str_mv AT tavaresanahmp dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords
AT pinhoarmandoj dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords
AT silvaraquelm dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords
AT rodriguesjoaomos dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords
AT bastoscarlosac dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords
AT ferreirapaulojsg dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords
AT afreixovera dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords