Cargando…

Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method

BACKGROUND: Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate...

Descripción completa

Detalles Bibliográficos
Autores principales:	Siadaty, Mir S, Knaus, William A
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1420278/ https://www.ncbi.nlm.nih.gov/pubmed/16522200 http://dx.doi.org/10.1186/1472-6947-6-13

_version_	1782127135618498560
author	Siadaty, Mir S Knaus, William A
author_facet	Siadaty, Mir S Knaus, William A
author_sort	Siadaty, Mir S
collection	PubMed
description	BACKGROUND: Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness), hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones. METHODS: The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance. RESULTS: We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database and a biomedical literature citation knowledgebase. The system estimated association scores for 50,000 patterns, composed of disease entities and lab results, by querying the database and the knowledgebase. It then computed the surprise scores by comparing the pairs of association scores. Finally, the system estimated statistical significance of the scores. CONCLUSION: The dual-mining method eliminates more than 90% of patterns with strong associations, thus identifying them as uninteresting. We found that the pruning of patterns using the surprise score matched the biomedical evidence in the 100 cases that were examined by hand. The method automates the acquisition of knowledge, thus reducing dependence on the knowledge elicited from human expert, which is usually a rate-limiting step.
format	Text
id	pubmed-1420278
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-14202782006-03-30 Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method Siadaty, Mir S Knaus, William A BMC Med Inform Decis Mak Research Article BACKGROUND: Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness), hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones. METHODS: The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance. RESULTS: We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database and a biomedical literature citation knowledgebase. The system estimated association scores for 50,000 patterns, composed of disease entities and lab results, by querying the database and the knowledgebase. It then computed the surprise scores by comparing the pairs of association scores. Finally, the system estimated statistical significance of the scores. CONCLUSION: The dual-mining method eliminates more than 90% of patterns with strong associations, thus identifying them as uninteresting. We found that the pruning of patterns using the surprise score matched the biomedical evidence in the 100 cases that were examined by hand. The method automates the acquisition of knowledge, thus reducing dependence on the knowledge elicited from human expert, which is usually a rate-limiting step. BioMed Central 2006-03-07 /pmc/articles/PMC1420278/ /pubmed/16522200 http://dx.doi.org/10.1186/1472-6947-6-13 Text en Copyright © 2006 Siadaty and Knaus; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Siadaty, Mir S Knaus, William A Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method
title	Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method
title_full	Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method
title_fullStr	Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method
title_full_unstemmed	Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method
title_short	Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method
title_sort	locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1420278/ https://www.ncbi.nlm.nih.gov/pubmed/16522200 http://dx.doi.org/10.1186/1472-6947-6-13
work_keys_str_mv	AT siadatymirs locatingpreviouslyunknownpatternsindataminingresultsadualdataandknowledgeminingmethod AT knauswilliama locatingpreviouslyunknownpatternsindataminingresultsadualdataandknowledgeminingmethod

Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method

Ejemplares similares