Cargando…
Data mining of the GAW14 simulated data using rough set theory and tree-based methods
Rough set theory and decision trees are data mining methods used for dealing with vagueness and uncertainty. They have been utilized to unearth hidden patterns in complicated datasets collected for industrial processes. The Genetic Analysis Workshop 14 simulated data were generated using a system th...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1866803/ https://www.ncbi.nlm.nih.gov/pubmed/16451592 http://dx.doi.org/10.1186/1471-2156-6-S1-S133 |
_version_ | 1782133332955365376 |
---|---|
author | Wei, Liang-Ying Huang, Cheng-Lung Chen, Chien-Hsiun |
author_facet | Wei, Liang-Ying Huang, Cheng-Lung Chen, Chien-Hsiun |
author_sort | Wei, Liang-Ying |
collection | PubMed |
description | Rough set theory and decision trees are data mining methods used for dealing with vagueness and uncertainty. They have been utilized to unearth hidden patterns in complicated datasets collected for industrial processes. The Genetic Analysis Workshop 14 simulated data were generated using a system that implemented multiple correlations among four consequential layers of genetic data (disease-related loci, endophenotypes, phenotypes, and one disease trait). When information of one layer was blocked and uncertainty was created in the correlations among these layers, the correlation between the first and last layers (susceptibility genes and the disease trait in this case), was not easily directly detected. In this study, we proposed a two-stage process that applied rough set theory and decision trees to identify genes susceptible to the disease trait. During the first stage, based on phenotypes of subjects and their parents, decision trees were built to predict trait values. Phenotypes retained in the decision trees were then advanced to the second stage, where rough set theory was applied to discover the minimal subsets of genes associated with the disease trait. For comparison, decision trees were also constructed to map susceptible genes during the second stage. Our results showed that the decision trees of the first stage had accuracy rates of about 99% in predicting the disease trait. The decision trees and rough set theory failed to identify the true disease-related loci. |
format | Text |
id | pubmed-1866803 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-18668032007-05-11 Data mining of the GAW14 simulated data using rough set theory and tree-based methods Wei, Liang-Ying Huang, Cheng-Lung Chen, Chien-Hsiun BMC Genet Proceedings Rough set theory and decision trees are data mining methods used for dealing with vagueness and uncertainty. They have been utilized to unearth hidden patterns in complicated datasets collected for industrial processes. The Genetic Analysis Workshop 14 simulated data were generated using a system that implemented multiple correlations among four consequential layers of genetic data (disease-related loci, endophenotypes, phenotypes, and one disease trait). When information of one layer was blocked and uncertainty was created in the correlations among these layers, the correlation between the first and last layers (susceptibility genes and the disease trait in this case), was not easily directly detected. In this study, we proposed a two-stage process that applied rough set theory and decision trees to identify genes susceptible to the disease trait. During the first stage, based on phenotypes of subjects and their parents, decision trees were built to predict trait values. Phenotypes retained in the decision trees were then advanced to the second stage, where rough set theory was applied to discover the minimal subsets of genes associated with the disease trait. For comparison, decision trees were also constructed to map susceptible genes during the second stage. Our results showed that the decision trees of the first stage had accuracy rates of about 99% in predicting the disease trait. The decision trees and rough set theory failed to identify the true disease-related loci. BioMed Central 2005-12-30 /pmc/articles/PMC1866803/ /pubmed/16451592 http://dx.doi.org/10.1186/1471-2156-6-S1-S133 Text en Copyright © 2005 Wei et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Wei, Liang-Ying Huang, Cheng-Lung Chen, Chien-Hsiun Data mining of the GAW14 simulated data using rough set theory and tree-based methods |
title | Data mining of the GAW14 simulated data using rough set theory and tree-based methods |
title_full | Data mining of the GAW14 simulated data using rough set theory and tree-based methods |
title_fullStr | Data mining of the GAW14 simulated data using rough set theory and tree-based methods |
title_full_unstemmed | Data mining of the GAW14 simulated data using rough set theory and tree-based methods |
title_short | Data mining of the GAW14 simulated data using rough set theory and tree-based methods |
title_sort | data mining of the gaw14 simulated data using rough set theory and tree-based methods |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1866803/ https://www.ncbi.nlm.nih.gov/pubmed/16451592 http://dx.doi.org/10.1186/1471-2156-6-S1-S133 |
work_keys_str_mv | AT weiliangying dataminingofthegaw14simulateddatausingroughsettheoryandtreebasedmethods AT huangchenglung dataminingofthegaw14simulateddatausingroughsettheoryandtreebasedmethods AT chenchienhsiun dataminingofthegaw14simulateddatausingroughsettheoryandtreebasedmethods |