Cargando…

Boosting alternating decision trees modeling of disease trait information

We applied the alternating decision trees (ADTrees) method to the last 3 replicates from the Aipotu, Danacca, Karangar, and NYC populations in the Problem 2 simulated Genetic Analysis Workshop dataset. Using information from the 12 binary phenotypes and sex as input and Kofendrerd Personality Disord...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Kuang-Yu, Lin, Jennifer, Zhou, Xiaobo, Wong, Stephen TC
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1866804/
https://www.ncbi.nlm.nih.gov/pubmed/16451591
http://dx.doi.org/10.1186/1471-2156-6-S1-S132
_version_ 1782133333196537856
author Liu, Kuang-Yu
Lin, Jennifer
Zhou, Xiaobo
Wong, Stephen TC
author_facet Liu, Kuang-Yu
Lin, Jennifer
Zhou, Xiaobo
Wong, Stephen TC
author_sort Liu, Kuang-Yu
collection PubMed
description We applied the alternating decision trees (ADTrees) method to the last 3 replicates from the Aipotu, Danacca, Karangar, and NYC populations in the Problem 2 simulated Genetic Analysis Workshop dataset. Using information from the 12 binary phenotypes and sex as input and Kofendrerd Personality Disorder disease status as the outcome of ADTrees-based classifiers, we obtained a new quantitative trait based on average prediction scores, which was then used for genome-wide quantitative trait linkage (QTL) analysis. ADTrees are machine learning methods that combine boosting and decision trees algorithms to generate smaller and easier-to-interpret classification rules. In this application, we compared four modeling strategies from the combinations of two boosting iterations (log or exponential loss functions) coupled with two choices of tree generation types (a full alternating decision tree or a classic boosting decision tree). These four different strategies were applied to the founders in each population to construct four classifiers, which were then applied to each study participant. To compute average prediction score for each subject with a specific trait profile, such a process was repeated with 10 runs of 10-fold cross validation, and standardized prediction scores obtained from the 10 runs were averaged and used in subsequent expectation-maximization Haseman-Elston QTL analyses (implemented in GENEHUNTER) with the approximate 900 SNPs in Hardy-Weinberg equilibrium provided for each population. Our QTL analyses on the basis of four models (a full alternating decision tree and a classic boosting decision tree paired with either log or exponential loss function) detected evidence for linkage (Z ≥ 1.96, p < 0.01) on chromosomes 1, 3, 5, and 9. Moreover, using average iteration and abundance scores for the 12 phenotypes and sex as their relevancy measurements, we found all relevant phenotypes for all four populations except phenotype b for the Karangar population, with suggested subgroup structure consistent with latent traits used in the model. In conclusion, our findings suggest that the ADTrees method may offer a more accurate representation of the disease status that allows for better detection of linkage evidence.
format Text
id pubmed-1866804
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18668042007-05-11 Boosting alternating decision trees modeling of disease trait information Liu, Kuang-Yu Lin, Jennifer Zhou, Xiaobo Wong, Stephen TC BMC Genet Proceedings We applied the alternating decision trees (ADTrees) method to the last 3 replicates from the Aipotu, Danacca, Karangar, and NYC populations in the Problem 2 simulated Genetic Analysis Workshop dataset. Using information from the 12 binary phenotypes and sex as input and Kofendrerd Personality Disorder disease status as the outcome of ADTrees-based classifiers, we obtained a new quantitative trait based on average prediction scores, which was then used for genome-wide quantitative trait linkage (QTL) analysis. ADTrees are machine learning methods that combine boosting and decision trees algorithms to generate smaller and easier-to-interpret classification rules. In this application, we compared four modeling strategies from the combinations of two boosting iterations (log or exponential loss functions) coupled with two choices of tree generation types (a full alternating decision tree or a classic boosting decision tree). These four different strategies were applied to the founders in each population to construct four classifiers, which were then applied to each study participant. To compute average prediction score for each subject with a specific trait profile, such a process was repeated with 10 runs of 10-fold cross validation, and standardized prediction scores obtained from the 10 runs were averaged and used in subsequent expectation-maximization Haseman-Elston QTL analyses (implemented in GENEHUNTER) with the approximate 900 SNPs in Hardy-Weinberg equilibrium provided for each population. Our QTL analyses on the basis of four models (a full alternating decision tree and a classic boosting decision tree paired with either log or exponential loss function) detected evidence for linkage (Z ≥ 1.96, p < 0.01) on chromosomes 1, 3, 5, and 9. Moreover, using average iteration and abundance scores for the 12 phenotypes and sex as their relevancy measurements, we found all relevant phenotypes for all four populations except phenotype b for the Karangar population, with suggested subgroup structure consistent with latent traits used in the model. In conclusion, our findings suggest that the ADTrees method may offer a more accurate representation of the disease status that allows for better detection of linkage evidence. BioMed Central 2005-12-30 /pmc/articles/PMC1866804/ /pubmed/16451591 http://dx.doi.org/10.1186/1471-2156-6-S1-S132 Text en Copyright © 2005 Liu et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Liu, Kuang-Yu
Lin, Jennifer
Zhou, Xiaobo
Wong, Stephen TC
Boosting alternating decision trees modeling of disease trait information
title Boosting alternating decision trees modeling of disease trait information
title_full Boosting alternating decision trees modeling of disease trait information
title_fullStr Boosting alternating decision trees modeling of disease trait information
title_full_unstemmed Boosting alternating decision trees modeling of disease trait information
title_short Boosting alternating decision trees modeling of disease trait information
title_sort boosting alternating decision trees modeling of disease trait information
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1866804/
https://www.ncbi.nlm.nih.gov/pubmed/16451591
http://dx.doi.org/10.1186/1471-2156-6-S1-S132
work_keys_str_mv AT liukuangyu boostingalternatingdecisiontreesmodelingofdiseasetraitinformation
AT linjennifer boostingalternatingdecisiontreesmodelingofdiseasetraitinformation
AT zhouxiaobo boostingalternatingdecisiontreesmodelingofdiseasetraitinformation
AT wongstephentc boostingalternatingdecisiontreesmodelingofdiseasetraitinformation