Cargando…

A new approach for interpreting Random Forest models and its application to the biology of ageing

MOTIVATION: This work uses the Random Forest (RF) classification algorithm to predict if a gene is over-expressed, under-expressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. How...

Descripción completa

Detalles Bibliográficos
Autores principales: Fabris, Fabio, Doherty, Aoife, Palmer, Daniel, de Magalhães, João Pedro, Freitas, Alex A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6041990/
https://www.ncbi.nlm.nih.gov/pubmed/29462247
http://dx.doi.org/10.1093/bioinformatics/bty087
Descripción
Sumario:MOTIVATION: This work uses the Random Forest (RF) classification algorithm to predict if a gene is over-expressed, under-expressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. However, current feature importance measures evaluate a feature as a whole (all feature values). We show that, for a popular type of biological data (Gene Ontology-based), usually only one value of a feature is particularly important for classification and the interpretation of the RF model. Hence, we propose a new algorithm for identifying the most important and most informative feature values in an RF model. RESULTS: The new feature importance measure identified highly relevant Gene Ontology terms for the aforementioned gene classification task, producing a feature ranking that is much more informative to biologists than an alternative, state-of-the-art feature importance measure. AVAILABILITY AND IMPLEMENTATION: The dataset and source codes used in this paper are available as ‘Supplementary Material’ and the description of the data can be found at: https://fabiofabris.github.io/bioinfo2018/web/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.