Cargando…
Boosting alternating decision trees modeling of disease trait information
We applied the alternating decision trees (ADTrees) method to the last 3 replicates from the Aipotu, Danacca, Karangar, and NYC populations in the Problem 2 simulated Genetic Analysis Workshop dataset. Using information from the 12 binary phenotypes and sex as input and Kofendrerd Personality Disord...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1866804/ https://www.ncbi.nlm.nih.gov/pubmed/16451591 http://dx.doi.org/10.1186/1471-2156-6-S1-S132 |
_version_ | 1782133333196537856 |
---|---|
author | Liu, Kuang-Yu Lin, Jennifer Zhou, Xiaobo Wong, Stephen TC |
author_facet | Liu, Kuang-Yu Lin, Jennifer Zhou, Xiaobo Wong, Stephen TC |
author_sort | Liu, Kuang-Yu |
collection | PubMed |
description | We applied the alternating decision trees (ADTrees) method to the last 3 replicates from the Aipotu, Danacca, Karangar, and NYC populations in the Problem 2 simulated Genetic Analysis Workshop dataset. Using information from the 12 binary phenotypes and sex as input and Kofendrerd Personality Disorder disease status as the outcome of ADTrees-based classifiers, we obtained a new quantitative trait based on average prediction scores, which was then used for genome-wide quantitative trait linkage (QTL) analysis. ADTrees are machine learning methods that combine boosting and decision trees algorithms to generate smaller and easier-to-interpret classification rules. In this application, we compared four modeling strategies from the combinations of two boosting iterations (log or exponential loss functions) coupled with two choices of tree generation types (a full alternating decision tree or a classic boosting decision tree). These four different strategies were applied to the founders in each population to construct four classifiers, which were then applied to each study participant. To compute average prediction score for each subject with a specific trait profile, such a process was repeated with 10 runs of 10-fold cross validation, and standardized prediction scores obtained from the 10 runs were averaged and used in subsequent expectation-maximization Haseman-Elston QTL analyses (implemented in GENEHUNTER) with the approximate 900 SNPs in Hardy-Weinberg equilibrium provided for each population. Our QTL analyses on the basis of four models (a full alternating decision tree and a classic boosting decision tree paired with either log or exponential loss function) detected evidence for linkage (Z ≥ 1.96, p < 0.01) on chromosomes 1, 3, 5, and 9. Moreover, using average iteration and abundance scores for the 12 phenotypes and sex as their relevancy measurements, we found all relevant phenotypes for all four populations except phenotype b for the Karangar population, with suggested subgroup structure consistent with latent traits used in the model. In conclusion, our findings suggest that the ADTrees method may offer a more accurate representation of the disease status that allows for better detection of linkage evidence. |
format | Text |
id | pubmed-1866804 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-18668042007-05-11 Boosting alternating decision trees modeling of disease trait information Liu, Kuang-Yu Lin, Jennifer Zhou, Xiaobo Wong, Stephen TC BMC Genet Proceedings We applied the alternating decision trees (ADTrees) method to the last 3 replicates from the Aipotu, Danacca, Karangar, and NYC populations in the Problem 2 simulated Genetic Analysis Workshop dataset. Using information from the 12 binary phenotypes and sex as input and Kofendrerd Personality Disorder disease status as the outcome of ADTrees-based classifiers, we obtained a new quantitative trait based on average prediction scores, which was then used for genome-wide quantitative trait linkage (QTL) analysis. ADTrees are machine learning methods that combine boosting and decision trees algorithms to generate smaller and easier-to-interpret classification rules. In this application, we compared four modeling strategies from the combinations of two boosting iterations (log or exponential loss functions) coupled with two choices of tree generation types (a full alternating decision tree or a classic boosting decision tree). These four different strategies were applied to the founders in each population to construct four classifiers, which were then applied to each study participant. To compute average prediction score for each subject with a specific trait profile, such a process was repeated with 10 runs of 10-fold cross validation, and standardized prediction scores obtained from the 10 runs were averaged and used in subsequent expectation-maximization Haseman-Elston QTL analyses (implemented in GENEHUNTER) with the approximate 900 SNPs in Hardy-Weinberg equilibrium provided for each population. Our QTL analyses on the basis of four models (a full alternating decision tree and a classic boosting decision tree paired with either log or exponential loss function) detected evidence for linkage (Z ≥ 1.96, p < 0.01) on chromosomes 1, 3, 5, and 9. Moreover, using average iteration and abundance scores for the 12 phenotypes and sex as their relevancy measurements, we found all relevant phenotypes for all four populations except phenotype b for the Karangar population, with suggested subgroup structure consistent with latent traits used in the model. In conclusion, our findings suggest that the ADTrees method may offer a more accurate representation of the disease status that allows for better detection of linkage evidence. BioMed Central 2005-12-30 /pmc/articles/PMC1866804/ /pubmed/16451591 http://dx.doi.org/10.1186/1471-2156-6-S1-S132 Text en Copyright © 2005 Liu et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Liu, Kuang-Yu Lin, Jennifer Zhou, Xiaobo Wong, Stephen TC Boosting alternating decision trees modeling of disease trait information |
title | Boosting alternating decision trees modeling of disease trait information |
title_full | Boosting alternating decision trees modeling of disease trait information |
title_fullStr | Boosting alternating decision trees modeling of disease trait information |
title_full_unstemmed | Boosting alternating decision trees modeling of disease trait information |
title_short | Boosting alternating decision trees modeling of disease trait information |
title_sort | boosting alternating decision trees modeling of disease trait information |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1866804/ https://www.ncbi.nlm.nih.gov/pubmed/16451591 http://dx.doi.org/10.1186/1471-2156-6-S1-S132 |
work_keys_str_mv | AT liukuangyu boostingalternatingdecisiontreesmodelingofdiseasetraitinformation AT linjennifer boostingalternatingdecisiontreesmodelingofdiseasetraitinformation AT zhouxiaobo boostingalternatingdecisiontreesmodelingofdiseasetraitinformation AT wongstephentc boostingalternatingdecisiontreesmodelingofdiseasetraitinformation |