Cargando…
A new approach for interpreting Random Forest models and its application to the biology of ageing
MOTIVATION: This work uses the Random Forest (RF) classification algorithm to predict if a gene is over-expressed, under-expressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. How...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6041990/ https://www.ncbi.nlm.nih.gov/pubmed/29462247 http://dx.doi.org/10.1093/bioinformatics/bty087 |
_version_ | 1783339082415341568 |
---|---|
author | Fabris, Fabio Doherty, Aoife Palmer, Daniel de Magalhães, João Pedro Freitas, Alex A |
author_facet | Fabris, Fabio Doherty, Aoife Palmer, Daniel de Magalhães, João Pedro Freitas, Alex A |
author_sort | Fabris, Fabio |
collection | PubMed |
description | MOTIVATION: This work uses the Random Forest (RF) classification algorithm to predict if a gene is over-expressed, under-expressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. However, current feature importance measures evaluate a feature as a whole (all feature values). We show that, for a popular type of biological data (Gene Ontology-based), usually only one value of a feature is particularly important for classification and the interpretation of the RF model. Hence, we propose a new algorithm for identifying the most important and most informative feature values in an RF model. RESULTS: The new feature importance measure identified highly relevant Gene Ontology terms for the aforementioned gene classification task, producing a feature ranking that is much more informative to biologists than an alternative, state-of-the-art feature importance measure. AVAILABILITY AND IMPLEMENTATION: The dataset and source codes used in this paper are available as ‘Supplementary Material’ and the description of the data can be found at: https://fabiofabris.github.io/bioinfo2018/web/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6041990 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-60419902018-07-17 A new approach for interpreting Random Forest models and its application to the biology of ageing Fabris, Fabio Doherty, Aoife Palmer, Daniel de Magalhães, João Pedro Freitas, Alex A Bioinformatics Original Papers MOTIVATION: This work uses the Random Forest (RF) classification algorithm to predict if a gene is over-expressed, under-expressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. However, current feature importance measures evaluate a feature as a whole (all feature values). We show that, for a popular type of biological data (Gene Ontology-based), usually only one value of a feature is particularly important for classification and the interpretation of the RF model. Hence, we propose a new algorithm for identifying the most important and most informative feature values in an RF model. RESULTS: The new feature importance measure identified highly relevant Gene Ontology terms for the aforementioned gene classification task, producing a feature ranking that is much more informative to biologists than an alternative, state-of-the-art feature importance measure. AVAILABILITY AND IMPLEMENTATION: The dataset and source codes used in this paper are available as ‘Supplementary Material’ and the description of the data can be found at: https://fabiofabris.github.io/bioinfo2018/web/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-07-15 2018-02-16 /pmc/articles/PMC6041990/ /pubmed/29462247 http://dx.doi.org/10.1093/bioinformatics/bty087 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Fabris, Fabio Doherty, Aoife Palmer, Daniel de Magalhães, João Pedro Freitas, Alex A A new approach for interpreting Random Forest models and its application to the biology of ageing |
title | A new approach for interpreting Random Forest models and its application to the biology of ageing |
title_full | A new approach for interpreting Random Forest models and its application to the biology of ageing |
title_fullStr | A new approach for interpreting Random Forest models and its application to the biology of ageing |
title_full_unstemmed | A new approach for interpreting Random Forest models and its application to the biology of ageing |
title_short | A new approach for interpreting Random Forest models and its application to the biology of ageing |
title_sort | new approach for interpreting random forest models and its application to the biology of ageing |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6041990/ https://www.ncbi.nlm.nih.gov/pubmed/29462247 http://dx.doi.org/10.1093/bioinformatics/bty087 |
work_keys_str_mv | AT fabrisfabio anewapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing AT dohertyaoife anewapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing AT palmerdaniel anewapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing AT demagalhaesjoaopedro anewapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing AT freitasalexa anewapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing AT fabrisfabio newapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing AT dohertyaoife newapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing AT palmerdaniel newapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing AT demagalhaesjoaopedro newapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing AT freitasalexa newapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing |