Cargando…
eRFSVM: a hybrid classifier to predict enhancers-integrating random forests with support vector machines
BACKGROUND: Enhancers are tissue specific distal regulation elements, playing vital roles in gene regulation and expression. The prediction and identification of enhancers are important but challenging issues for bioinformatics studies. Existing computational methods, mostly single classifiers, can...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5226099/ https://www.ncbi.nlm.nih.gov/pubmed/28096768 http://dx.doi.org/10.1186/s41065-016-0012-2 |
_version_ | 1782493616644554752 |
---|---|
author | Huang, Fang Shen, Jiawei Guo, Qingli Shi, Yongyong |
author_facet | Huang, Fang Shen, Jiawei Guo, Qingli Shi, Yongyong |
author_sort | Huang, Fang |
collection | PubMed |
description | BACKGROUND: Enhancers are tissue specific distal regulation elements, playing vital roles in gene regulation and expression. The prediction and identification of enhancers are important but challenging issues for bioinformatics studies. Existing computational methods, mostly single classifiers, can only predict the transcriptional coactivator EP300 based enhancers and show low generalization performance. RESULTS: We built a hybrid classifier called eRFSVM in this study, using random forests as a base classifier, and support vector machines as a main classifier. eRFSVM integrated two components as eRFSVM-ENCODE and eRFSVM-FANTOM5 with diverse features and labels. The base classifier trained datasets from a single tissue or cell with random forests. The main classifier made the final decision by support vector machines algorithm, with the predicting results of base classifiers as inputs. For eRFSVM-ENCODE, we trained datasets from cell lines including Gm12878, Hep, H1-hesc and Huvec, using ChIP-Seq datasets as features and EP300 based enhancers as labels. We tested eRFSVM-ENCODE on K562 dataset, and resulted in a predicting precision of 83.69 %, which was much better than existing classifiers. For eRFSVM-FANTOM5, with enhancers identified by RNA in FANTOM5 project as labels, the precision, recall, F-score and accuracy were 86.17 %, 36.06 %, 50.84 % and 93.38 % using eRFSVM, increasing 23.24 % (69.92 %), 97.05 % (18.30 %), 76.90 % (28.74 %), 4.69 % (89.20 %) than the existing algorithm, respectively. CONCLUSIONS: All these results demonstrated that eRFSVM was a better classifier in predicting both EP300 based and FAMTOM5 RNAs based enhancers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s41065-016-0012-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5226099 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-52260992017-01-17 eRFSVM: a hybrid classifier to predict enhancers-integrating random forests with support vector machines Huang, Fang Shen, Jiawei Guo, Qingli Shi, Yongyong Hereditas Research BACKGROUND: Enhancers are tissue specific distal regulation elements, playing vital roles in gene regulation and expression. The prediction and identification of enhancers are important but challenging issues for bioinformatics studies. Existing computational methods, mostly single classifiers, can only predict the transcriptional coactivator EP300 based enhancers and show low generalization performance. RESULTS: We built a hybrid classifier called eRFSVM in this study, using random forests as a base classifier, and support vector machines as a main classifier. eRFSVM integrated two components as eRFSVM-ENCODE and eRFSVM-FANTOM5 with diverse features and labels. The base classifier trained datasets from a single tissue or cell with random forests. The main classifier made the final decision by support vector machines algorithm, with the predicting results of base classifiers as inputs. For eRFSVM-ENCODE, we trained datasets from cell lines including Gm12878, Hep, H1-hesc and Huvec, using ChIP-Seq datasets as features and EP300 based enhancers as labels. We tested eRFSVM-ENCODE on K562 dataset, and resulted in a predicting precision of 83.69 %, which was much better than existing classifiers. For eRFSVM-FANTOM5, with enhancers identified by RNA in FANTOM5 project as labels, the precision, recall, F-score and accuracy were 86.17 %, 36.06 %, 50.84 % and 93.38 % using eRFSVM, increasing 23.24 % (69.92 %), 97.05 % (18.30 %), 76.90 % (28.74 %), 4.69 % (89.20 %) than the existing algorithm, respectively. CONCLUSIONS: All these results demonstrated that eRFSVM was a better classifier in predicting both EP300 based and FAMTOM5 RNAs based enhancers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s41065-016-0012-2) contains supplementary material, which is available to authorized users. BioMed Central 2016-06-30 /pmc/articles/PMC5226099/ /pubmed/28096768 http://dx.doi.org/10.1186/s41065-016-0012-2 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Huang, Fang Shen, Jiawei Guo, Qingli Shi, Yongyong eRFSVM: a hybrid classifier to predict enhancers-integrating random forests with support vector machines |
title | eRFSVM: a hybrid classifier to predict enhancers-integrating random forests with support vector machines |
title_full | eRFSVM: a hybrid classifier to predict enhancers-integrating random forests with support vector machines |
title_fullStr | eRFSVM: a hybrid classifier to predict enhancers-integrating random forests with support vector machines |
title_full_unstemmed | eRFSVM: a hybrid classifier to predict enhancers-integrating random forests with support vector machines |
title_short | eRFSVM: a hybrid classifier to predict enhancers-integrating random forests with support vector machines |
title_sort | erfsvm: a hybrid classifier to predict enhancers-integrating random forests with support vector machines |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5226099/ https://www.ncbi.nlm.nih.gov/pubmed/28096768 http://dx.doi.org/10.1186/s41065-016-0012-2 |
work_keys_str_mv | AT huangfang erfsvmahybridclassifiertopredictenhancersintegratingrandomforestswithsupportvectormachines AT shenjiawei erfsvmahybridclassifiertopredictenhancersintegratingrandomforestswithsupportvectormachines AT guoqingli erfsvmahybridclassifiertopredictenhancersintegratingrandomforestswithsupportvectormachines AT shiyongyong erfsvmahybridclassifiertopredictenhancersintegratingrandomforestswithsupportvectormachines |