Cargando…

Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm

It has been suggested that the mammalian genome is composed mainly of long compositionally homogeneous domains. Such domains are frequently identified using recursive segmentation algorithms based on the Jensen–Shannon divergence. However, a common difficulty with such methods is deciding when to ha...

Descripción completa

Detalles Bibliográficos
Autores principales: Elhaik, Eran, Graur, Dan, Josić, Krešimir, Landan, Giddy
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2926622/
https://www.ncbi.nlm.nih.gov/pubmed/20571085
http://dx.doi.org/10.1093/nar/gkq532
_version_ 1782185714400624640
author Elhaik, Eran
Graur, Dan
Josić, Krešimir
Landan, Giddy
author_facet Elhaik, Eran
Graur, Dan
Josić, Krešimir
Landan, Giddy
author_sort Elhaik, Eran
collection PubMed
description It has been suggested that the mammalian genome is composed mainly of long compositionally homogeneous domains. Such domains are frequently identified using recursive segmentation algorithms based on the Jensen–Shannon divergence. However, a common difficulty with such methods is deciding when to halt the recursive partitioning and what criteria to use in deciding whether a detected boundary between two segments is real or not. We demonstrate that commonly used halting criteria are intrinsically biased, and propose IsoPlotter, a parameter-free segmentation algorithm that overcomes such biases by using a simple dynamic halting criterion and tests the homogeneity of the inferred domains. IsoPlotter was compared with an alternative segmentation algorithm, D(JS), using two sets of simulated genomic sequences. Our results show that IsoPlotter was able to infer both long and short compositionally homogeneous domains with low GC content dispersion, whereas D(JS) failed to identify short compositionally homogeneous domains and sequences with low compositional dispersion. By segmenting the human genome with IsoPlotter, we found that one-third of the genome is composed of compositionally nonhomogeneous domains and the remaining is a mixture of many short compositionally homogeneous domains and relatively few long ones.
format Text
id pubmed-2926622
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-29266222010-08-30 Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm Elhaik, Eran Graur, Dan Josić, Krešimir Landan, Giddy Nucleic Acids Res Methods Online It has been suggested that the mammalian genome is composed mainly of long compositionally homogeneous domains. Such domains are frequently identified using recursive segmentation algorithms based on the Jensen–Shannon divergence. However, a common difficulty with such methods is deciding when to halt the recursive partitioning and what criteria to use in deciding whether a detected boundary between two segments is real or not. We demonstrate that commonly used halting criteria are intrinsically biased, and propose IsoPlotter, a parameter-free segmentation algorithm that overcomes such biases by using a simple dynamic halting criterion and tests the homogeneity of the inferred domains. IsoPlotter was compared with an alternative segmentation algorithm, D(JS), using two sets of simulated genomic sequences. Our results show that IsoPlotter was able to infer both long and short compositionally homogeneous domains with low GC content dispersion, whereas D(JS) failed to identify short compositionally homogeneous domains and sequences with low compositional dispersion. By segmenting the human genome with IsoPlotter, we found that one-third of the genome is composed of compositionally nonhomogeneous domains and the remaining is a mixture of many short compositionally homogeneous domains and relatively few long ones. Oxford University Press 2010-08 2010-06-22 /pmc/articles/PMC2926622/ /pubmed/20571085 http://dx.doi.org/10.1093/nar/gkq532 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Elhaik, Eran
Graur, Dan
Josić, Krešimir
Landan, Giddy
Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm
title Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm
title_full Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm
title_fullStr Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm
title_full_unstemmed Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm
title_short Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm
title_sort identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2926622/
https://www.ncbi.nlm.nih.gov/pubmed/20571085
http://dx.doi.org/10.1093/nar/gkq532
work_keys_str_mv AT elhaikeran identifyingcompositionallyhomogeneousandnonhomogeneousdomainswithinthehumangenomeusinganovelsegmentationalgorithm
AT graurdan identifyingcompositionallyhomogeneousandnonhomogeneousdomainswithinthehumangenomeusinganovelsegmentationalgorithm
AT josickresimir identifyingcompositionallyhomogeneousandnonhomogeneousdomainswithinthehumangenomeusinganovelsegmentationalgorithm
AT landangiddy identifyingcompositionallyhomogeneousandnonhomogeneousdomainswithinthehumangenomeusinganovelsegmentationalgorithm