Cargando…

WordCluster: detecting clusters of DNA words and genomic elements

BACKGROUND: Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statisti...

Descripción completa

Detalles Bibliográficos
Autores principales: Hackenberg, Michael, Carpena, Pedro, Bernaola-Galván, Pedro, Barturen, Guillermo, Alganza, Ángel M, Oliver, José L
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3037320/
https://www.ncbi.nlm.nih.gov/pubmed/21261981
http://dx.doi.org/10.1186/1748-7188-6-2
_version_ 1782197972182761472
author Hackenberg, Michael
Carpena, Pedro
Bernaola-Galván, Pedro
Barturen, Guillermo
Alganza, Ángel M
Oliver, José L
author_facet Hackenberg, Michael
Carpena, Pedro
Bernaola-Galván, Pedro
Barturen, Guillermo
Alganza, Ángel M
Oliver, José L
author_sort Hackenberg, Michael
collection PubMed
description BACKGROUND: Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. RESULTS: We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. CONCLUSIONS: WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes.
format Text
id pubmed-3037320
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30373202011-02-18 WordCluster: detecting clusters of DNA words and genomic elements Hackenberg, Michael Carpena, Pedro Bernaola-Galván, Pedro Barturen, Guillermo Alganza, Ángel M Oliver, José L Algorithms Mol Biol Software Article BACKGROUND: Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. RESULTS: We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. CONCLUSIONS: WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes. BioMed Central 2011-01-24 /pmc/articles/PMC3037320/ /pubmed/21261981 http://dx.doi.org/10.1186/1748-7188-6-2 Text en Copyright ©2011 Hackenberg et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Article
Hackenberg, Michael
Carpena, Pedro
Bernaola-Galván, Pedro
Barturen, Guillermo
Alganza, Ángel M
Oliver, José L
WordCluster: detecting clusters of DNA words and genomic elements
title WordCluster: detecting clusters of DNA words and genomic elements
title_full WordCluster: detecting clusters of DNA words and genomic elements
title_fullStr WordCluster: detecting clusters of DNA words and genomic elements
title_full_unstemmed WordCluster: detecting clusters of DNA words and genomic elements
title_short WordCluster: detecting clusters of DNA words and genomic elements
title_sort wordcluster: detecting clusters of dna words and genomic elements
topic Software Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3037320/
https://www.ncbi.nlm.nih.gov/pubmed/21261981
http://dx.doi.org/10.1186/1748-7188-6-2
work_keys_str_mv AT hackenbergmichael wordclusterdetectingclustersofdnawordsandgenomicelements
AT carpenapedro wordclusterdetectingclustersofdnawordsandgenomicelements
AT bernaolagalvanpedro wordclusterdetectingclustersofdnawordsandgenomicelements
AT barturenguillermo wordclusterdetectingclustersofdnawordsandgenomicelements
AT alganzaangelm wordclusterdetectingclustersofdnawordsandgenomicelements
AT oliverjosel wordclusterdetectingclustersofdnawordsandgenomicelements