Cargando…

Proportional fault-tolerant data mining with applications to bioinformatics

The mining of frequent patterns in databases has been studied for several years, but few reports have discussed for fault-tolerant (FT) pattern mining. FT data mining is more suitable for extracting interesting information from real-world data that may be polluted by noise. In particular, the increa...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lee, Guanling, Peng, Sheng-Lung, Lin, Yuh-Tzu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2009
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7087812/ https://www.ncbi.nlm.nih.gov/pubmed/32214877 http://dx.doi.org/10.1007/s10796-009-9158-z

_version_	1783509409661452288
author	Lee, Guanling Peng, Sheng-Lung Lin, Yuh-Tzu
author_facet	Lee, Guanling Peng, Sheng-Lung Lin, Yuh-Tzu
author_sort	Lee, Guanling
collection	PubMed
description	The mining of frequent patterns in databases has been studied for several years, but few reports have discussed for fault-tolerant (FT) pattern mining. FT data mining is more suitable for extracting interesting information from real-world data that may be polluted by noise. In particular, the increasing amount of today’s biological databases requires such a data mining technique to mine important data, e.g., motifs. In this paper, we propose the concept of proportional FT mining of frequent patterns. The number of tolerable faults in a proportional FT pattern is proportional to the length of the pattern. Two algorithms are designed for solving this problem. The first algorithm, named FT-BottomUp, applies an FT-Apriori heuristic and finds all FT patterns with any number of faults. The second algorithm, FT-LevelWise, divides all FT patterns into several groups according to the number of tolerable faults, and mines the content patterns of each group in turn. By applying our algorithm on real data, two reported epitopes of spike proteins of SARS-CoV can be found in our resulting itemset and the proportional FT data mining is better than the fixed FT data mining for this application.
format	Online Article Text
id	pubmed-7087812
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-70878122020-03-23 Proportional fault-tolerant data mining with applications to bioinformatics Lee, Guanling Peng, Sheng-Lung Lin, Yuh-Tzu Inf Syst Front Article The mining of frequent patterns in databases has been studied for several years, but few reports have discussed for fault-tolerant (FT) pattern mining. FT data mining is more suitable for extracting interesting information from real-world data that may be polluted by noise. In particular, the increasing amount of today’s biological databases requires such a data mining technique to mine important data, e.g., motifs. In this paper, we propose the concept of proportional FT mining of frequent patterns. The number of tolerable faults in a proportional FT pattern is proportional to the length of the pattern. Two algorithms are designed for solving this problem. The first algorithm, named FT-BottomUp, applies an FT-Apriori heuristic and finds all FT patterns with any number of faults. The second algorithm, FT-LevelWise, divides all FT patterns into several groups according to the number of tolerable faults, and mines the content patterns of each group in turn. By applying our algorithm on real data, two reported epitopes of spike proteins of SARS-CoV can be found in our resulting itemset and the proportional FT data mining is better than the fixed FT data mining for this application. Springer US 2009-02-19 2009 /pmc/articles/PMC7087812/ /pubmed/32214877 http://dx.doi.org/10.1007/s10796-009-9158-z Text en © Springer Science+Business Media, LLC 2009 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Lee, Guanling Peng, Sheng-Lung Lin, Yuh-Tzu Proportional fault-tolerant data mining with applications to bioinformatics
title	Proportional fault-tolerant data mining with applications to bioinformatics
title_full	Proportional fault-tolerant data mining with applications to bioinformatics
title_fullStr	Proportional fault-tolerant data mining with applications to bioinformatics
title_full_unstemmed	Proportional fault-tolerant data mining with applications to bioinformatics
title_short	Proportional fault-tolerant data mining with applications to bioinformatics
title_sort	proportional fault-tolerant data mining with applications to bioinformatics
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7087812/ https://www.ncbi.nlm.nih.gov/pubmed/32214877 http://dx.doi.org/10.1007/s10796-009-9158-z
work_keys_str_mv	AT leeguanling proportionalfaulttolerantdataminingwithapplicationstobioinformatics AT pengshenglung proportionalfaulttolerantdataminingwithapplicationstobioinformatics AT linyuhtzu proportionalfaulttolerantdataminingwithapplicationstobioinformatics

Proportional fault-tolerant data mining with applications to bioinformatics

Ejemplares similares