Cargando…

Taking promoters out of enhancers in sequence based predictions of tissue-specific mammalian enhancers

BACKGROUND: Many genetic diseases are caused by mutations in non-coding regions of the genome. These mutations are frequently found in enhancer sequences, causing disruption to the regulatory program of the cell. Enhancers are short regulatory sequences in the non-coding part of the genome that are...

Descripción completa

Detalles Bibliográficos
Autores principales: Herman-Izycka, Julia, Wlasnowolski, Michal, Wilczynski, Bartek
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5461523/
https://www.ncbi.nlm.nih.gov/pubmed/28589862
http://dx.doi.org/10.1186/s12920-017-0264-3
_version_ 1783242350140588032
author Herman-Izycka, Julia
Wlasnowolski, Michal
Wilczynski, Bartek
author_facet Herman-Izycka, Julia
Wlasnowolski, Michal
Wilczynski, Bartek
author_sort Herman-Izycka, Julia
collection PubMed
description BACKGROUND: Many genetic diseases are caused by mutations in non-coding regions of the genome. These mutations are frequently found in enhancer sequences, causing disruption to the regulatory program of the cell. Enhancers are short regulatory sequences in the non-coding part of the genome that are essential for the proper regulation of transcription. While the experimental methods for identification of such sequences are improving every year, our understanding of the rules behind the enhancer activity has not progressed much in the last decade. This is especially true in case of tissue-specific enhancers, where there are clear problems in predicting specificity of enhancer activity. RESULTS: We show a random-forest based machine learning approach capable of matching the performance of the current state-of-the-art methods for enhancer prediction. Then we show that it is, similarly to other published methods, frequently cross-predicting enhancers as active in different tissues, making it less useful for predicting tissue specific activity. Then we proceed to show that the problem is related to the fact that the enhancer predicting models exhibit a bias towards predicting gene promoters as active enhancers. Then we show that using a two-step classifier can lead to lower cross-prediction between tissues. CONCLUSIONS: We provide whole-genome predictions of human heart and brain enhancers obtained with two-step classifier. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12920-017-0264-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5461523
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54615232017-06-07 Taking promoters out of enhancers in sequence based predictions of tissue-specific mammalian enhancers Herman-Izycka, Julia Wlasnowolski, Michal Wilczynski, Bartek BMC Med Genomics Research BACKGROUND: Many genetic diseases are caused by mutations in non-coding regions of the genome. These mutations are frequently found in enhancer sequences, causing disruption to the regulatory program of the cell. Enhancers are short regulatory sequences in the non-coding part of the genome that are essential for the proper regulation of transcription. While the experimental methods for identification of such sequences are improving every year, our understanding of the rules behind the enhancer activity has not progressed much in the last decade. This is especially true in case of tissue-specific enhancers, where there are clear problems in predicting specificity of enhancer activity. RESULTS: We show a random-forest based machine learning approach capable of matching the performance of the current state-of-the-art methods for enhancer prediction. Then we show that it is, similarly to other published methods, frequently cross-predicting enhancers as active in different tissues, making it less useful for predicting tissue specific activity. Then we proceed to show that the problem is related to the fact that the enhancer predicting models exhibit a bias towards predicting gene promoters as active enhancers. Then we show that using a two-step classifier can lead to lower cross-prediction between tissues. CONCLUSIONS: We provide whole-genome predictions of human heart and brain enhancers obtained with two-step classifier. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12920-017-0264-3) contains supplementary material, which is available to authorized users. BioMed Central 2017-05-24 /pmc/articles/PMC5461523/ /pubmed/28589862 http://dx.doi.org/10.1186/s12920-017-0264-3 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Herman-Izycka, Julia
Wlasnowolski, Michal
Wilczynski, Bartek
Taking promoters out of enhancers in sequence based predictions of tissue-specific mammalian enhancers
title Taking promoters out of enhancers in sequence based predictions of tissue-specific mammalian enhancers
title_full Taking promoters out of enhancers in sequence based predictions of tissue-specific mammalian enhancers
title_fullStr Taking promoters out of enhancers in sequence based predictions of tissue-specific mammalian enhancers
title_full_unstemmed Taking promoters out of enhancers in sequence based predictions of tissue-specific mammalian enhancers
title_short Taking promoters out of enhancers in sequence based predictions of tissue-specific mammalian enhancers
title_sort taking promoters out of enhancers in sequence based predictions of tissue-specific mammalian enhancers
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5461523/
https://www.ncbi.nlm.nih.gov/pubmed/28589862
http://dx.doi.org/10.1186/s12920-017-0264-3
work_keys_str_mv AT hermanizyckajulia takingpromotersoutofenhancersinsequencebasedpredictionsoftissuespecificmammalianenhancers
AT wlasnowolskimichal takingpromotersoutofenhancersinsequencebasedpredictionsoftissuespecificmammalianenhancers
AT wilczynskibartek takingpromotersoutofenhancersinsequencebasedpredictionsoftissuespecificmammalianenhancers