Cargando…
Sequence based prediction of enhancer regions from DNA random walk
Regulatory elements play a critical role in development process of eukaryotic organisms by controlling the spatio-temporal pattern of gene expression. Enhancer is one of these elements which contributes to the regulation of gene expression through chromatin loop or eRNA expression. Experimental iden...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6206163/ https://www.ncbi.nlm.nih.gov/pubmed/30374023 http://dx.doi.org/10.1038/s41598-018-33413-y |
_version_ | 1783366315800526848 |
---|---|
author | Singh, Anand Pratap Mishra, Sarthak Jabin, Suraiya |
author_facet | Singh, Anand Pratap Mishra, Sarthak Jabin, Suraiya |
author_sort | Singh, Anand Pratap |
collection | PubMed |
description | Regulatory elements play a critical role in development process of eukaryotic organisms by controlling the spatio-temporal pattern of gene expression. Enhancer is one of these elements which contributes to the regulation of gene expression through chromatin loop or eRNA expression. Experimental identification of a novel enhancer is a costly exercise, due to which there is an interest in computational approaches to predict enhancer regions in a genome. Existing computational approaches to achieve this goal have primarily been based on training of high-throughput data such as transcription factor binding sites (TFBS), DNA methylation, and histone modification marks etc. On the other hand, purely sequence based approaches to predict enhancer regions are promising as they are not biased by the complexity or context specificity of such datasets. In sequence based approaches, machine learning models are either directly trained on sequences or sequence features, to classify sequences as enhancers or non-enhancers. In this paper, we derived statistical and nonlinear dynamic features along with k-mer features from experimentally validated sequences taken from Vista Enhancer Browser through random walk model and applied different machine learning based methods to predict whether an input test sequence is enhancer or not. Experimental results demonstrate the success of proposed model based on Ensemble method with area under curve (AUC) 0.86, 0.89, and 0.87 in B cells, T cells, and Natural killer cells for histone marks dataset. |
format | Online Article Text |
id | pubmed-6206163 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-62061632018-11-01 Sequence based prediction of enhancer regions from DNA random walk Singh, Anand Pratap Mishra, Sarthak Jabin, Suraiya Sci Rep Article Regulatory elements play a critical role in development process of eukaryotic organisms by controlling the spatio-temporal pattern of gene expression. Enhancer is one of these elements which contributes to the regulation of gene expression through chromatin loop or eRNA expression. Experimental identification of a novel enhancer is a costly exercise, due to which there is an interest in computational approaches to predict enhancer regions in a genome. Existing computational approaches to achieve this goal have primarily been based on training of high-throughput data such as transcription factor binding sites (TFBS), DNA methylation, and histone modification marks etc. On the other hand, purely sequence based approaches to predict enhancer regions are promising as they are not biased by the complexity or context specificity of such datasets. In sequence based approaches, machine learning models are either directly trained on sequences or sequence features, to classify sequences as enhancers or non-enhancers. In this paper, we derived statistical and nonlinear dynamic features along with k-mer features from experimentally validated sequences taken from Vista Enhancer Browser through random walk model and applied different machine learning based methods to predict whether an input test sequence is enhancer or not. Experimental results demonstrate the success of proposed model based on Ensemble method with area under curve (AUC) 0.86, 0.89, and 0.87 in B cells, T cells, and Natural killer cells for histone marks dataset. Nature Publishing Group UK 2018-10-29 /pmc/articles/PMC6206163/ /pubmed/30374023 http://dx.doi.org/10.1038/s41598-018-33413-y Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Singh, Anand Pratap Mishra, Sarthak Jabin, Suraiya Sequence based prediction of enhancer regions from DNA random walk |
title | Sequence based prediction of enhancer regions from DNA random walk |
title_full | Sequence based prediction of enhancer regions from DNA random walk |
title_fullStr | Sequence based prediction of enhancer regions from DNA random walk |
title_full_unstemmed | Sequence based prediction of enhancer regions from DNA random walk |
title_short | Sequence based prediction of enhancer regions from DNA random walk |
title_sort | sequence based prediction of enhancer regions from dna random walk |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6206163/ https://www.ncbi.nlm.nih.gov/pubmed/30374023 http://dx.doi.org/10.1038/s41598-018-33413-y |
work_keys_str_mv | AT singhanandpratap sequencebasedpredictionofenhancerregionsfromdnarandomwalk AT mishrasarthak sequencebasedpredictionofenhancerregionsfromdnarandomwalk AT jabinsuraiya sequencebasedpredictionofenhancerregionsfromdnarandomwalk |