Cargando…
DNA word analysis based on the distribution of the distances between symmetric words
We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distributio...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5428789/ https://www.ncbi.nlm.nih.gov/pubmed/28389642 http://dx.doi.org/10.1038/s41598-017-00646-2 |
_version_ | 1783235901880532992 |
---|---|
author | Tavares, Ana H. M. P. Pinho, Armando J. Silva, Raquel M. Rodrigues, João M. O. S. Bastos, Carlos A. C. Ferreira, Paulo J. S. G. Afreixo, Vera |
author_facet | Tavares, Ana H. M. P. Pinho, Armando J. Silva, Raquel M. Rodrigues, João M. O. S. Bastos, Carlos A. C. Ferreira, Paulo J. S. G. Afreixo, Vera |
author_sort | Tavares, Ana H. M. P. |
collection | PubMed |
description | We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected. |
format | Online Article Text |
id | pubmed-5428789 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-54287892017-05-15 DNA word analysis based on the distribution of the distances between symmetric words Tavares, Ana H. M. P. Pinho, Armando J. Silva, Raquel M. Rodrigues, João M. O. S. Bastos, Carlos A. C. Ferreira, Paulo J. S. G. Afreixo, Vera Sci Rep Article We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected. Nature Publishing Group UK 2017-04-07 /pmc/articles/PMC5428789/ /pubmed/28389642 http://dx.doi.org/10.1038/s41598-017-00646-2 Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Tavares, Ana H. M. P. Pinho, Armando J. Silva, Raquel M. Rodrigues, João M. O. S. Bastos, Carlos A. C. Ferreira, Paulo J. S. G. Afreixo, Vera DNA word analysis based on the distribution of the distances between symmetric words |
title | DNA word analysis based on the distribution of the distances between symmetric words |
title_full | DNA word analysis based on the distribution of the distances between symmetric words |
title_fullStr | DNA word analysis based on the distribution of the distances between symmetric words |
title_full_unstemmed | DNA word analysis based on the distribution of the distances between symmetric words |
title_short | DNA word analysis based on the distribution of the distances between symmetric words |
title_sort | dna word analysis based on the distribution of the distances between symmetric words |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5428789/ https://www.ncbi.nlm.nih.gov/pubmed/28389642 http://dx.doi.org/10.1038/s41598-017-00646-2 |
work_keys_str_mv | AT tavaresanahmp dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords AT pinhoarmandoj dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords AT silvaraquelm dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords AT rodriguesjoaomos dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords AT bastoscarlosac dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords AT ferreirapaulojsg dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords AT afreixovera dnawordanalysisbasedonthedistributionofthedistancesbetweensymmetricwords |