Cargando…

A new approach for interpreting Random Forest models and its application to the biology of ageing

MOTIVATION: This work uses the Random Forest (RF) classification algorithm to predict if a gene is over-expressed, under-expressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. How...

Descripción completa

Detalles Bibliográficos
Autores principales: Fabris, Fabio, Doherty, Aoife, Palmer, Daniel, de Magalhães, João Pedro, Freitas, Alex A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6041990/
https://www.ncbi.nlm.nih.gov/pubmed/29462247
http://dx.doi.org/10.1093/bioinformatics/bty087
_version_ 1783339082415341568
author Fabris, Fabio
Doherty, Aoife
Palmer, Daniel
de Magalhães, João Pedro
Freitas, Alex A
author_facet Fabris, Fabio
Doherty, Aoife
Palmer, Daniel
de Magalhães, João Pedro
Freitas, Alex A
author_sort Fabris, Fabio
collection PubMed
description MOTIVATION: This work uses the Random Forest (RF) classification algorithm to predict if a gene is over-expressed, under-expressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. However, current feature importance measures evaluate a feature as a whole (all feature values). We show that, for a popular type of biological data (Gene Ontology-based), usually only one value of a feature is particularly important for classification and the interpretation of the RF model. Hence, we propose a new algorithm for identifying the most important and most informative feature values in an RF model. RESULTS: The new feature importance measure identified highly relevant Gene Ontology terms for the aforementioned gene classification task, producing a feature ranking that is much more informative to biologists than an alternative, state-of-the-art feature importance measure. AVAILABILITY AND IMPLEMENTATION: The dataset and source codes used in this paper are available as ‘Supplementary Material’ and the description of the data can be found at: https://fabiofabris.github.io/bioinfo2018/web/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6041990
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60419902018-07-17 A new approach for interpreting Random Forest models and its application to the biology of ageing Fabris, Fabio Doherty, Aoife Palmer, Daniel de Magalhães, João Pedro Freitas, Alex A Bioinformatics Original Papers MOTIVATION: This work uses the Random Forest (RF) classification algorithm to predict if a gene is over-expressed, under-expressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. However, current feature importance measures evaluate a feature as a whole (all feature values). We show that, for a popular type of biological data (Gene Ontology-based), usually only one value of a feature is particularly important for classification and the interpretation of the RF model. Hence, we propose a new algorithm for identifying the most important and most informative feature values in an RF model. RESULTS: The new feature importance measure identified highly relevant Gene Ontology terms for the aforementioned gene classification task, producing a feature ranking that is much more informative to biologists than an alternative, state-of-the-art feature importance measure. AVAILABILITY AND IMPLEMENTATION: The dataset and source codes used in this paper are available as ‘Supplementary Material’ and the description of the data can be found at: https://fabiofabris.github.io/bioinfo2018/web/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-07-15 2018-02-16 /pmc/articles/PMC6041990/ /pubmed/29462247 http://dx.doi.org/10.1093/bioinformatics/bty087 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Fabris, Fabio
Doherty, Aoife
Palmer, Daniel
de Magalhães, João Pedro
Freitas, Alex A
A new approach for interpreting Random Forest models and its application to the biology of ageing
title A new approach for interpreting Random Forest models and its application to the biology of ageing
title_full A new approach for interpreting Random Forest models and its application to the biology of ageing
title_fullStr A new approach for interpreting Random Forest models and its application to the biology of ageing
title_full_unstemmed A new approach for interpreting Random Forest models and its application to the biology of ageing
title_short A new approach for interpreting Random Forest models and its application to the biology of ageing
title_sort new approach for interpreting random forest models and its application to the biology of ageing
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6041990/
https://www.ncbi.nlm.nih.gov/pubmed/29462247
http://dx.doi.org/10.1093/bioinformatics/bty087
work_keys_str_mv AT fabrisfabio anewapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing
AT dohertyaoife anewapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing
AT palmerdaniel anewapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing
AT demagalhaesjoaopedro anewapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing
AT freitasalexa anewapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing
AT fabrisfabio newapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing
AT dohertyaoife newapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing
AT palmerdaniel newapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing
AT demagalhaesjoaopedro newapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing
AT freitasalexa newapproachforinterpretingrandomforestmodelsanditsapplicationtothebiologyofageing