Cargando…

An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among...

Descripción completa

Detalles Bibliográficos
Autores principales:	Karim, Md. Rezaul, Rashid, Md. Mamunur, Jeong, Byeong-Soo, Choi, Ho-Jin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Korea Genome Organization 2012
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3475483/ https://www.ncbi.nlm.nih.gov/pubmed/23105929 http://dx.doi.org/10.5808/GI.2012.10.1.51

_version_	1782246955103027200
author	Karim, Md. Rezaul Rashid, Md. Mamunur Jeong, Byeong-Soo Choi, Ho-Jin
author_facet	Karim, Md. Rezaul Rashid, Md. Mamunur Jeong, Byeong-Soo Choi, Ho-Jin
author_sort	Karim, Md. Rezaul
collection	PubMed
description	Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.
format	Online Article Text
id	pubmed-3475483
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	Korea Genome Organization
record_format	MEDLINE/PubMed
spelling	pubmed-34754832012-10-26 An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases Karim, Md. Rezaul Rashid, Md. Mamunur Jeong, Byeong-Soo Choi, Ho-Jin Genomics Inf Article Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time. Korea Genome Organization 2012-03 2012-03-31 /pmc/articles/PMC3475483/ /pubmed/23105929 http://dx.doi.org/10.5808/GI.2012.10.1.51 Text en Copyright © 2012 by The Korea Genome Organization http://creativecommons.org/licenses/by-nc/3.0 It is identical to the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/).
spellingShingle	Article Karim, Md. Rezaul Rashid, Md. Mamunur Jeong, Byeong-Soo Choi, Ho-Jin An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases
title	An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases
title_full	An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases
title_fullStr	An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases
title_full_unstemmed	An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases
title_short	An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases
title_sort	efficient approach to mining maximal contiguous frequent patterns from large dna sequence databases
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3475483/ https://www.ncbi.nlm.nih.gov/pubmed/23105929 http://dx.doi.org/10.5808/GI.2012.10.1.51
work_keys_str_mv	AT karimmdrezaul anefficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases AT rashidmdmamunur anefficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases AT jeongbyeongsoo anefficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases AT choihojin anefficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases AT karimmdrezaul efficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases AT rashidmdmamunur efficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases AT jeongbyeongsoo efficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases AT choihojin efficientapproachtominingmaximalcontiguousfrequentpatternsfromlargednasequencedatabases

An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

Ejemplares similares