Cargando…

Learning accurate and interpretable models based on regularized random forests regression

BACKGROUND: Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultane...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Sheng, Dissanayake, Shamitha, Patel, Sanjay, Dang, Xin, Mlsna, Todd, Chen, Yixin, Wilkins, Dawn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243592/
https://www.ncbi.nlm.nih.gov/pubmed/25350120
http://dx.doi.org/10.1186/1752-0509-8-S3-S5
_version_ 1782346124175081472
author Liu, Sheng
Dissanayake, Shamitha
Patel, Sanjay
Dang, Xin
Mlsna, Todd
Chen, Yixin
Wilkins, Dawn
author_facet Liu, Sheng
Dissanayake, Shamitha
Patel, Sanjay
Dang, Xin
Mlsna, Todd
Chen, Yixin
Wilkins, Dawn
author_sort Liu, Sheng
collection PubMed
description BACKGROUND: Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. METHODS: In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. RESULTS: We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. CONCLUSION: It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied.
format Online
Article
Text
id pubmed-4243592
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42435922014-11-26 Learning accurate and interpretable models based on regularized random forests regression Liu, Sheng Dissanayake, Shamitha Patel, Sanjay Dang, Xin Mlsna, Todd Chen, Yixin Wilkins, Dawn BMC Syst Biol Research BACKGROUND: Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. METHODS: In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. RESULTS: We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. CONCLUSION: It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied. BioMed Central 2014-10-22 /pmc/articles/PMC4243592/ /pubmed/25350120 http://dx.doi.org/10.1186/1752-0509-8-S3-S5 Text en Copyright © 2014 Liu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Liu, Sheng
Dissanayake, Shamitha
Patel, Sanjay
Dang, Xin
Mlsna, Todd
Chen, Yixin
Wilkins, Dawn
Learning accurate and interpretable models based on regularized random forests regression
title Learning accurate and interpretable models based on regularized random forests regression
title_full Learning accurate and interpretable models based on regularized random forests regression
title_fullStr Learning accurate and interpretable models based on regularized random forests regression
title_full_unstemmed Learning accurate and interpretable models based on regularized random forests regression
title_short Learning accurate and interpretable models based on regularized random forests regression
title_sort learning accurate and interpretable models based on regularized random forests regression
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243592/
https://www.ncbi.nlm.nih.gov/pubmed/25350120
http://dx.doi.org/10.1186/1752-0509-8-S3-S5
work_keys_str_mv AT liusheng learningaccurateandinterpretablemodelsbasedonregularizedrandomforestsregression
AT dissanayakeshamitha learningaccurateandinterpretablemodelsbasedonregularizedrandomforestsregression
AT patelsanjay learningaccurateandinterpretablemodelsbasedonregularizedrandomforestsregression
AT dangxin learningaccurateandinterpretablemodelsbasedonregularizedrandomforestsregression
AT mlsnatodd learningaccurateandinterpretablemodelsbasedonregularizedrandomforestsregression
AT chenyixin learningaccurateandinterpretablemodelsbasedonregularizedrandomforestsregression
AT wilkinsdawn learningaccurateandinterpretablemodelsbasedonregularizedrandomforestsregression