Cargando…

Data mining of the GAW14 simulated data using rough set theory and tree-based methods

Rough set theory and decision trees are data mining methods used for dealing with vagueness and uncertainty. They have been utilized to unearth hidden patterns in complicated datasets collected for industrial processes. The Genetic Analysis Workshop 14 simulated data were generated using a system th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wei, Liang-Ying, Huang, Cheng-Lung, Chen, Chien-Hsiun
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2005
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1866803/ https://www.ncbi.nlm.nih.gov/pubmed/16451592 http://dx.doi.org/10.1186/1471-2156-6-S1-S133

_version_	1782133332955365376
author	Wei, Liang-Ying Huang, Cheng-Lung Chen, Chien-Hsiun
author_facet	Wei, Liang-Ying Huang, Cheng-Lung Chen, Chien-Hsiun
author_sort	Wei, Liang-Ying
collection	PubMed
description	Rough set theory and decision trees are data mining methods used for dealing with vagueness and uncertainty. They have been utilized to unearth hidden patterns in complicated datasets collected for industrial processes. The Genetic Analysis Workshop 14 simulated data were generated using a system that implemented multiple correlations among four consequential layers of genetic data (disease-related loci, endophenotypes, phenotypes, and one disease trait). When information of one layer was blocked and uncertainty was created in the correlations among these layers, the correlation between the first and last layers (susceptibility genes and the disease trait in this case), was not easily directly detected. In this study, we proposed a two-stage process that applied rough set theory and decision trees to identify genes susceptible to the disease trait. During the first stage, based on phenotypes of subjects and their parents, decision trees were built to predict trait values. Phenotypes retained in the decision trees were then advanced to the second stage, where rough set theory was applied to discover the minimal subsets of genes associated with the disease trait. For comparison, decision trees were also constructed to map susceptible genes during the second stage. Our results showed that the decision trees of the first stage had accuracy rates of about 99% in predicting the disease trait. The decision trees and rough set theory failed to identify the true disease-related loci.
format	Text
id	pubmed-1866803
institution	National Center for Biotechnology Information
language	English
publishDate	2005
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-18668032007-05-11 Data mining of the GAW14 simulated data using rough set theory and tree-based methods Wei, Liang-Ying Huang, Cheng-Lung Chen, Chien-Hsiun BMC Genet Proceedings Rough set theory and decision trees are data mining methods used for dealing with vagueness and uncertainty. They have been utilized to unearth hidden patterns in complicated datasets collected for industrial processes. The Genetic Analysis Workshop 14 simulated data were generated using a system that implemented multiple correlations among four consequential layers of genetic data (disease-related loci, endophenotypes, phenotypes, and one disease trait). When information of one layer was blocked and uncertainty was created in the correlations among these layers, the correlation between the first and last layers (susceptibility genes and the disease trait in this case), was not easily directly detected. In this study, we proposed a two-stage process that applied rough set theory and decision trees to identify genes susceptible to the disease trait. During the first stage, based on phenotypes of subjects and their parents, decision trees were built to predict trait values. Phenotypes retained in the decision trees were then advanced to the second stage, where rough set theory was applied to discover the minimal subsets of genes associated with the disease trait. For comparison, decision trees were also constructed to map susceptible genes during the second stage. Our results showed that the decision trees of the first stage had accuracy rates of about 99% in predicting the disease trait. The decision trees and rough set theory failed to identify the true disease-related loci. BioMed Central 2005-12-30 /pmc/articles/PMC1866803/ /pubmed/16451592 http://dx.doi.org/10.1186/1471-2156-6-S1-S133 Text en Copyright © 2005 Wei et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Wei, Liang-Ying Huang, Cheng-Lung Chen, Chien-Hsiun Data mining of the GAW14 simulated data using rough set theory and tree-based methods
title	Data mining of the GAW14 simulated data using rough set theory and tree-based methods
title_full	Data mining of the GAW14 simulated data using rough set theory and tree-based methods
title_fullStr	Data mining of the GAW14 simulated data using rough set theory and tree-based methods
title_full_unstemmed	Data mining of the GAW14 simulated data using rough set theory and tree-based methods
title_short	Data mining of the GAW14 simulated data using rough set theory and tree-based methods
title_sort	data mining of the gaw14 simulated data using rough set theory and tree-based methods
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1866803/ https://www.ncbi.nlm.nih.gov/pubmed/16451592 http://dx.doi.org/10.1186/1471-2156-6-S1-S133
work_keys_str_mv	AT weiliangying dataminingofthegaw14simulateddatausingroughsettheoryandtreebasedmethods AT huangchenglung dataminingofthegaw14simulateddatausingroughsettheoryandtreebasedmethods AT chenchienhsiun dataminingofthegaw14simulateddatausingroughsettheoryandtreebasedmethods

Data mining of the GAW14 simulated data using rough set theory and tree-based methods

Ejemplares similares