Cargando…

Decision trees in epidemiological research

BACKGROUND: In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups...

Descripción completa

Detalles Bibliográficos
Autores principales: Venkatasubramaniam, Ashwini, Wolfson, Julian, Mitchell, Nathan, Barnes, Timothy, JaKa, Meghan, French, Simone
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5607590/
https://www.ncbi.nlm.nih.gov/pubmed/28943885
http://dx.doi.org/10.1186/s12982-017-0064-4
_version_ 1783265318054920192
author Venkatasubramaniam, Ashwini
Wolfson, Julian
Mitchell, Nathan
Barnes, Timothy
JaKa, Meghan
French, Simone
author_facet Venkatasubramaniam, Ashwini
Wolfson, Julian
Mitchell, Nathan
Barnes, Timothy
JaKa, Meghan
French, Simone
author_sort Venkatasubramaniam, Ashwini
collection PubMed
description BACKGROUND: In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. MAIN TEXT: We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. CONCLUSIONS: Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12982-017-0064-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5607590
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56075902017-09-24 Decision trees in epidemiological research Venkatasubramaniam, Ashwini Wolfson, Julian Mitchell, Nathan Barnes, Timothy JaKa, Meghan French, Simone Emerg Themes Epidemiol Analytic Perspective BACKGROUND: In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. MAIN TEXT: We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. CONCLUSIONS: Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12982-017-0064-4) contains supplementary material, which is available to authorized users. BioMed Central 2017-09-20 /pmc/articles/PMC5607590/ /pubmed/28943885 http://dx.doi.org/10.1186/s12982-017-0064-4 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Analytic Perspective
Venkatasubramaniam, Ashwini
Wolfson, Julian
Mitchell, Nathan
Barnes, Timothy
JaKa, Meghan
French, Simone
Decision trees in epidemiological research
title Decision trees in epidemiological research
title_full Decision trees in epidemiological research
title_fullStr Decision trees in epidemiological research
title_full_unstemmed Decision trees in epidemiological research
title_short Decision trees in epidemiological research
title_sort decision trees in epidemiological research
topic Analytic Perspective
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5607590/
https://www.ncbi.nlm.nih.gov/pubmed/28943885
http://dx.doi.org/10.1186/s12982-017-0064-4
work_keys_str_mv AT venkatasubramaniamashwini decisiontreesinepidemiologicalresearch
AT wolfsonjulian decisiontreesinepidemiologicalresearch
AT mitchellnathan decisiontreesinepidemiologicalresearch
AT barnestimothy decisiontreesinepidemiologicalresearch
AT jakameghan decisiontreesinepidemiologicalresearch
AT frenchsimone decisiontreesinepidemiologicalresearch