Cargando…

Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes

BACKGROUND: Complex diseases are often difficult to diagnose, treat and study due to the multi-factorial nature of the underlying etiology. Large data sets are now widely available that can be used to define novel, mechanistically distinct disease subtypes (endotypes) in a completely data-driven man...

Descripción completa

Detalles Bibliográficos
Autores principales: Williams-DeVane, ClarLynda R, Reif, David M, Cohen Hubal, Elaine, Bushel, Pierre R, Hudgens, Edward E, Gallagher, Jane E, Edwards, Stephen W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4228284/
https://www.ncbi.nlm.nih.gov/pubmed/24188919
http://dx.doi.org/10.1186/1752-0509-7-119
_version_ 1782343951358885888
author Williams-DeVane, ClarLynda R
Reif, David M
Cohen Hubal, Elaine
Bushel, Pierre R
Hudgens, Edward E
Gallagher, Jane E
Edwards, Stephen W
author_facet Williams-DeVane, ClarLynda R
Reif, David M
Cohen Hubal, Elaine
Bushel, Pierre R
Hudgens, Edward E
Gallagher, Jane E
Edwards, Stephen W
author_sort Williams-DeVane, ClarLynda R
collection PubMed
description BACKGROUND: Complex diseases are often difficult to diagnose, treat and study due to the multi-factorial nature of the underlying etiology. Large data sets are now widely available that can be used to define novel, mechanistically distinct disease subtypes (endotypes) in a completely data-driven manner. However, significant challenges exist with regard to how to segregate individuals into suitable subtypes of the disease and understand the distinct biological mechanisms of each when the goal is to maximize the discovery potential of these data sets. RESULTS: A multi-step decision tree-based method is described for defining endotypes based on gene expression, clinical covariates, and disease indicators using childhood asthma as a case study. We attempted to use alternative approaches such as the Student’s t-test, single data domain clustering and the Modk-prototypes algorithm, which incorporates multiple data domains into a single analysis and none performed as well as the novel multi-step decision tree method. This new method gave the best segregation of asthmatics and non-asthmatics, and it provides easy access to all genes and clinical covariates that distinguish the groups. CONCLUSIONS: The multi-step decision tree method described here will lead to better understanding of complex disease in general by allowing purely data-driven disease endotypes to facilitate the discovery of new mechanisms underlying these diseases. This application should be considered a complement to ongoing efforts to better define and diagnose known endotypes. When coupled with existing methods developed to determine the genetics of gene expression, these methods provide a mechanism for linking genetics and exposomics data and thereby accounting for both major determinants of disease.
format Online
Article
Text
id pubmed-4228284
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42282842014-11-13 Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes Williams-DeVane, ClarLynda R Reif, David M Cohen Hubal, Elaine Bushel, Pierre R Hudgens, Edward E Gallagher, Jane E Edwards, Stephen W BMC Syst Biol Methodology Article BACKGROUND: Complex diseases are often difficult to diagnose, treat and study due to the multi-factorial nature of the underlying etiology. Large data sets are now widely available that can be used to define novel, mechanistically distinct disease subtypes (endotypes) in a completely data-driven manner. However, significant challenges exist with regard to how to segregate individuals into suitable subtypes of the disease and understand the distinct biological mechanisms of each when the goal is to maximize the discovery potential of these data sets. RESULTS: A multi-step decision tree-based method is described for defining endotypes based on gene expression, clinical covariates, and disease indicators using childhood asthma as a case study. We attempted to use alternative approaches such as the Student’s t-test, single data domain clustering and the Modk-prototypes algorithm, which incorporates multiple data domains into a single analysis and none performed as well as the novel multi-step decision tree method. This new method gave the best segregation of asthmatics and non-asthmatics, and it provides easy access to all genes and clinical covariates that distinguish the groups. CONCLUSIONS: The multi-step decision tree method described here will lead to better understanding of complex disease in general by allowing purely data-driven disease endotypes to facilitate the discovery of new mechanisms underlying these diseases. This application should be considered a complement to ongoing efforts to better define and diagnose known endotypes. When coupled with existing methods developed to determine the genetics of gene expression, these methods provide a mechanism for linking genetics and exposomics data and thereby accounting for both major determinants of disease. BioMed Central 2013-11-04 /pmc/articles/PMC4228284/ /pubmed/24188919 http://dx.doi.org/10.1186/1752-0509-7-119 Text en Copyright © 2013 Williams-DeVane et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Williams-DeVane, ClarLynda R
Reif, David M
Cohen Hubal, Elaine
Bushel, Pierre R
Hudgens, Edward E
Gallagher, Jane E
Edwards, Stephen W
Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes
title Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes
title_full Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes
title_fullStr Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes
title_full_unstemmed Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes
title_short Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes
title_sort decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4228284/
https://www.ncbi.nlm.nih.gov/pubmed/24188919
http://dx.doi.org/10.1186/1752-0509-7-119
work_keys_str_mv AT williamsdevaneclarlyndar decisiontreebasedmethodforintegratinggeneexpressiondemographicandclinicaldatatodeterminediseaseendotypes
AT reifdavidm decisiontreebasedmethodforintegratinggeneexpressiondemographicandclinicaldatatodeterminediseaseendotypes
AT cohenhubalelaine decisiontreebasedmethodforintegratinggeneexpressiondemographicandclinicaldatatodeterminediseaseendotypes
AT bushelpierrer decisiontreebasedmethodforintegratinggeneexpressiondemographicandclinicaldatatodeterminediseaseendotypes
AT hudgensedwarde decisiontreebasedmethodforintegratinggeneexpressiondemographicandclinicaldatatodeterminediseaseendotypes
AT gallagherjanee decisiontreebasedmethodforintegratinggeneexpressiondemographicandclinicaldatatodeterminediseaseendotypes
AT edwardsstephenw decisiontreebasedmethodforintegratinggeneexpressiondemographicandclinicaldatatodeterminediseaseendotypes