Cargando…

High-complexity regions in mammalian genomes are enriched for developmental genes

MOTIVATION: Unique sequence regions are associated with genetic function in vertebrate genomes. However, measuring uniqueness, or absence of long repeats, along a genome is conceptually and computationally difficult. Here we use a variant of the Lempel-Ziv complexity, the match complexity, [Formula:...

Descripción completa

Detalles Bibliográficos
Autores principales: Pirogov, Anton, Pfaffelhuber, Peter, Börsch-Haubold, Angelika, Haubold, Bernhard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6546125/
https://www.ncbi.nlm.nih.gov/pubmed/30395202
http://dx.doi.org/10.1093/bioinformatics/bty922
_version_ 1783423499229986816
author Pirogov, Anton
Pfaffelhuber, Peter
Börsch-Haubold, Angelika
Haubold, Bernhard
author_facet Pirogov, Anton
Pfaffelhuber, Peter
Börsch-Haubold, Angelika
Haubold, Bernhard
author_sort Pirogov, Anton
collection PubMed
description MOTIVATION: Unique sequence regions are associated with genetic function in vertebrate genomes. However, measuring uniqueness, or absence of long repeats, along a genome is conceptually and computationally difficult. Here we use a variant of the Lempel-Ziv complexity, the match complexity, [Formula: see text] , and augment it by deriving its null distribution for random sequences. We then apply [Formula: see text] to the human and mouse genomes to investigate the relationship between sequence complexity and function. RESULTS: We implemented [Formula: see text] in the program macle and show through simulation that the newly derived null distribution of [Formula: see text] is accurate. This allows us to delineate high-complexity regions in the human and mouse genomes. Using our program macle2go, we find that these regions are twofold enriched for genes. Moreover, the genes contained in these regions are more than 10-fold enriched for developmental functions. AVAILABILITY AND IMPLEMENTATION: Source code for macle and macle2go is available from www.github.com/evolbioinf/macle and www.github.com/evolbioinf/macle2go, respectively; [Formula: see text] browser tracks from guanine.evolbio.mgp.de/complexity. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6546125
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-65461252019-06-13 High-complexity regions in mammalian genomes are enriched for developmental genes Pirogov, Anton Pfaffelhuber, Peter Börsch-Haubold, Angelika Haubold, Bernhard Bioinformatics Original Papers MOTIVATION: Unique sequence regions are associated with genetic function in vertebrate genomes. However, measuring uniqueness, or absence of long repeats, along a genome is conceptually and computationally difficult. Here we use a variant of the Lempel-Ziv complexity, the match complexity, [Formula: see text] , and augment it by deriving its null distribution for random sequences. We then apply [Formula: see text] to the human and mouse genomes to investigate the relationship between sequence complexity and function. RESULTS: We implemented [Formula: see text] in the program macle and show through simulation that the newly derived null distribution of [Formula: see text] is accurate. This allows us to delineate high-complexity regions in the human and mouse genomes. Using our program macle2go, we find that these regions are twofold enriched for genes. Moreover, the genes contained in these regions are more than 10-fold enriched for developmental functions. AVAILABILITY AND IMPLEMENTATION: Source code for macle and macle2go is available from www.github.com/evolbioinf/macle and www.github.com/evolbioinf/macle2go, respectively; [Formula: see text] browser tracks from guanine.evolbio.mgp.de/complexity. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-06-01 2018-11-05 /pmc/articles/PMC6546125/ /pubmed/30395202 http://dx.doi.org/10.1093/bioinformatics/bty922 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Pirogov, Anton
Pfaffelhuber, Peter
Börsch-Haubold, Angelika
Haubold, Bernhard
High-complexity regions in mammalian genomes are enriched for developmental genes
title High-complexity regions in mammalian genomes are enriched for developmental genes
title_full High-complexity regions in mammalian genomes are enriched for developmental genes
title_fullStr High-complexity regions in mammalian genomes are enriched for developmental genes
title_full_unstemmed High-complexity regions in mammalian genomes are enriched for developmental genes
title_short High-complexity regions in mammalian genomes are enriched for developmental genes
title_sort high-complexity regions in mammalian genomes are enriched for developmental genes
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6546125/
https://www.ncbi.nlm.nih.gov/pubmed/30395202
http://dx.doi.org/10.1093/bioinformatics/bty922
work_keys_str_mv AT pirogovanton highcomplexityregionsinmammaliangenomesareenrichedfordevelopmentalgenes
AT pfaffelhuberpeter highcomplexityregionsinmammaliangenomesareenrichedfordevelopmentalgenes
AT borschhauboldangelika highcomplexityregionsinmammaliangenomesareenrichedfordevelopmentalgenes
AT hauboldbernhard highcomplexityregionsinmammaliangenomesareenrichedfordevelopmentalgenes