Cargando…

Sequence based prediction of enhancer regions from DNA random walk

Regulatory elements play a critical role in development process of eukaryotic organisms by controlling the spatio-temporal pattern of gene expression. Enhancer is one of these elements which contributes to the regulation of gene expression through chromatin loop or eRNA expression. Experimental iden...

Descripción completa

Detalles Bibliográficos
Autores principales: Singh, Anand Pratap, Mishra, Sarthak, Jabin, Suraiya
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6206163/
https://www.ncbi.nlm.nih.gov/pubmed/30374023
http://dx.doi.org/10.1038/s41598-018-33413-y
_version_ 1783366315800526848
author Singh, Anand Pratap
Mishra, Sarthak
Jabin, Suraiya
author_facet Singh, Anand Pratap
Mishra, Sarthak
Jabin, Suraiya
author_sort Singh, Anand Pratap
collection PubMed
description Regulatory elements play a critical role in development process of eukaryotic organisms by controlling the spatio-temporal pattern of gene expression. Enhancer is one of these elements which contributes to the regulation of gene expression through chromatin loop or eRNA expression. Experimental identification of a novel enhancer is a costly exercise, due to which there is an interest in computational approaches to predict enhancer regions in a genome. Existing computational approaches to achieve this goal have primarily been based on training of high-throughput data such as transcription factor binding sites (TFBS), DNA methylation, and histone modification marks etc. On the other hand, purely sequence based approaches to predict enhancer regions are promising as they are not biased by the complexity or context specificity of such datasets. In sequence based approaches, machine learning models are either directly trained on sequences or sequence features, to classify sequences as enhancers or non-enhancers. In this paper, we derived statistical and nonlinear dynamic features along with k-mer features from experimentally validated sequences taken from Vista Enhancer Browser through random walk model and applied different machine learning based methods to predict whether an input test sequence is enhancer or not. Experimental results demonstrate the success of proposed model based on Ensemble method with area under curve (AUC) 0.86, 0.89, and 0.87 in B cells, T cells, and Natural killer cells for histone marks dataset.
format Online
Article
Text
id pubmed-6206163
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-62061632018-11-01 Sequence based prediction of enhancer regions from DNA random walk Singh, Anand Pratap Mishra, Sarthak Jabin, Suraiya Sci Rep Article Regulatory elements play a critical role in development process of eukaryotic organisms by controlling the spatio-temporal pattern of gene expression. Enhancer is one of these elements which contributes to the regulation of gene expression through chromatin loop or eRNA expression. Experimental identification of a novel enhancer is a costly exercise, due to which there is an interest in computational approaches to predict enhancer regions in a genome. Existing computational approaches to achieve this goal have primarily been based on training of high-throughput data such as transcription factor binding sites (TFBS), DNA methylation, and histone modification marks etc. On the other hand, purely sequence based approaches to predict enhancer regions are promising as they are not biased by the complexity or context specificity of such datasets. In sequence based approaches, machine learning models are either directly trained on sequences or sequence features, to classify sequences as enhancers or non-enhancers. In this paper, we derived statistical and nonlinear dynamic features along with k-mer features from experimentally validated sequences taken from Vista Enhancer Browser through random walk model and applied different machine learning based methods to predict whether an input test sequence is enhancer or not. Experimental results demonstrate the success of proposed model based on Ensemble method with area under curve (AUC) 0.86, 0.89, and 0.87 in B cells, T cells, and Natural killer cells for histone marks dataset. Nature Publishing Group UK 2018-10-29 /pmc/articles/PMC6206163/ /pubmed/30374023 http://dx.doi.org/10.1038/s41598-018-33413-y Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Singh, Anand Pratap
Mishra, Sarthak
Jabin, Suraiya
Sequence based prediction of enhancer regions from DNA random walk
title Sequence based prediction of enhancer regions from DNA random walk
title_full Sequence based prediction of enhancer regions from DNA random walk
title_fullStr Sequence based prediction of enhancer regions from DNA random walk
title_full_unstemmed Sequence based prediction of enhancer regions from DNA random walk
title_short Sequence based prediction of enhancer regions from DNA random walk
title_sort sequence based prediction of enhancer regions from dna random walk
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6206163/
https://www.ncbi.nlm.nih.gov/pubmed/30374023
http://dx.doi.org/10.1038/s41598-018-33413-y
work_keys_str_mv AT singhanandpratap sequencebasedpredictionofenhancerregionsfromdnarandomwalk
AT mishrasarthak sequencebasedpredictionofenhancerregionsfromdnarandomwalk
AT jabinsuraiya sequencebasedpredictionofenhancerregionsfromdnarandomwalk