Cargando…
Learning accurate and interpretable models based on regularized random forests regression
BACKGROUND: Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultane...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243592/ https://www.ncbi.nlm.nih.gov/pubmed/25350120 http://dx.doi.org/10.1186/1752-0509-8-S3-S5 |
_version_ | 1782346124175081472 |
---|---|
author | Liu, Sheng Dissanayake, Shamitha Patel, Sanjay Dang, Xin Mlsna, Todd Chen, Yixin Wilkins, Dawn |
author_facet | Liu, Sheng Dissanayake, Shamitha Patel, Sanjay Dang, Xin Mlsna, Todd Chen, Yixin Wilkins, Dawn |
author_sort | Liu, Sheng |
collection | PubMed |
description | BACKGROUND: Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. METHODS: In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. RESULTS: We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. CONCLUSION: It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied. |
format | Online Article Text |
id | pubmed-4243592 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-42435922014-11-26 Learning accurate and interpretable models based on regularized random forests regression Liu, Sheng Dissanayake, Shamitha Patel, Sanjay Dang, Xin Mlsna, Todd Chen, Yixin Wilkins, Dawn BMC Syst Biol Research BACKGROUND: Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. METHODS: In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. RESULTS: We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. CONCLUSION: It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied. BioMed Central 2014-10-22 /pmc/articles/PMC4243592/ /pubmed/25350120 http://dx.doi.org/10.1186/1752-0509-8-S3-S5 Text en Copyright © 2014 Liu et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Liu, Sheng Dissanayake, Shamitha Patel, Sanjay Dang, Xin Mlsna, Todd Chen, Yixin Wilkins, Dawn Learning accurate and interpretable models based on regularized random forests regression |
title | Learning accurate and interpretable models based on regularized random forests regression |
title_full | Learning accurate and interpretable models based on regularized random forests regression |
title_fullStr | Learning accurate and interpretable models based on regularized random forests regression |
title_full_unstemmed | Learning accurate and interpretable models based on regularized random forests regression |
title_short | Learning accurate and interpretable models based on regularized random forests regression |
title_sort | learning accurate and interpretable models based on regularized random forests regression |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4243592/ https://www.ncbi.nlm.nih.gov/pubmed/25350120 http://dx.doi.org/10.1186/1752-0509-8-S3-S5 |
work_keys_str_mv | AT liusheng learningaccurateandinterpretablemodelsbasedonregularizedrandomforestsregression AT dissanayakeshamitha learningaccurateandinterpretablemodelsbasedonregularizedrandomforestsregression AT patelsanjay learningaccurateandinterpretablemodelsbasedonregularizedrandomforestsregression AT dangxin learningaccurateandinterpretablemodelsbasedonregularizedrandomforestsregression AT mlsnatodd learningaccurateandinterpretablemodelsbasedonregularizedrandomforestsregression AT chenyixin learningaccurateandinterpretablemodelsbasedonregularizedrandomforestsregression AT wilkinsdawn learningaccurateandinterpretablemodelsbasedonregularizedrandomforestsregression |