Cargando…
High-complexity regions in mammalian genomes are enriched for developmental genes
MOTIVATION: Unique sequence regions are associated with genetic function in vertebrate genomes. However, measuring uniqueness, or absence of long repeats, along a genome is conceptually and computationally difficult. Here we use a variant of the Lempel-Ziv complexity, the match complexity, [Formula:...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6546125/ https://www.ncbi.nlm.nih.gov/pubmed/30395202 http://dx.doi.org/10.1093/bioinformatics/bty922 |
Sumario: | MOTIVATION: Unique sequence regions are associated with genetic function in vertebrate genomes. However, measuring uniqueness, or absence of long repeats, along a genome is conceptually and computationally difficult. Here we use a variant of the Lempel-Ziv complexity, the match complexity, [Formula: see text] , and augment it by deriving its null distribution for random sequences. We then apply [Formula: see text] to the human and mouse genomes to investigate the relationship between sequence complexity and function. RESULTS: We implemented [Formula: see text] in the program macle and show through simulation that the newly derived null distribution of [Formula: see text] is accurate. This allows us to delineate high-complexity regions in the human and mouse genomes. Using our program macle2go, we find that these regions are twofold enriched for genes. Moreover, the genes contained in these regions are more than 10-fold enriched for developmental functions. AVAILABILITY AND IMPLEMENTATION: Source code for macle and macle2go is available from www.github.com/evolbioinf/macle and www.github.com/evolbioinf/macle2go, respectively; [Formula: see text] browser tracks from guanine.evolbio.mgp.de/complexity. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
---|