Cargando…

DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest

DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions an...

Descripción completa

Detalles Bibliográficos
Autores principales: Manavalan, Balachandran, Shin, Tae Hwan, Lee, Gwang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Impact Journals LLC 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5788611/
https://www.ncbi.nlm.nih.gov/pubmed/29416743
http://dx.doi.org/10.18632/oncotarget.23099
_version_ 1783296106279469056
author Manavalan, Balachandran
Shin, Tae Hwan
Lee, Gwang
author_facet Manavalan, Balachandran
Shin, Tae Hwan
Lee, Gwang
author_sort Manavalan, Balachandran
collection PubMed
description DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html
format Online
Article
Text
id pubmed-5788611
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Impact Journals LLC
record_format MEDLINE/PubMed
spelling pubmed-57886112018-02-07 DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest Manavalan, Balachandran Shin, Tae Hwan Lee, Gwang Oncotarget Research Paper DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html Impact Journals LLC 2017-12-08 /pmc/articles/PMC5788611/ /pubmed/29416743 http://dx.doi.org/10.18632/oncotarget.23099 Text en Copyright: © 2018 Manavalan et al. http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License 3.0 (http://creativecommons.org/licenses/by/3.0/) (CC BY 3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Paper
Manavalan, Balachandran
Shin, Tae Hwan
Lee, Gwang
DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest
title DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest
title_full DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest
title_fullStr DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest
title_full_unstemmed DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest
title_short DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest
title_sort dhspred: support-vector-machine-based human dnase i hypersensitive sites prediction using the optimal features selected by random forest
topic Research Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5788611/
https://www.ncbi.nlm.nih.gov/pubmed/29416743
http://dx.doi.org/10.18632/oncotarget.23099
work_keys_str_mv AT manavalanbalachandran dhspredsupportvectormachinebasedhumandnaseihypersensitivesitespredictionusingtheoptimalfeaturesselectedbyrandomforest
AT shintaehwan dhspredsupportvectormachinebasedhumandnaseihypersensitivesitespredictionusingtheoptimalfeaturesselectedbyrandomforest
AT leegwang dhspredsupportvectormachinebasedhumandnaseihypersensitivesitespredictionusingtheoptimalfeaturesselectedbyrandomforest