Cargando…

Efficient Mining of Interesting Patterns in Large Biological Sequences

Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology,...

Descripción completa

Detalles Bibliográficos
Autores principales: Rashid, Md. Mamunur, Karim, Md. Rezaul, Jeong, Byeong-Soo, Choi, Ho-Jin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Korea Genome Organization 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3475482/
https://www.ncbi.nlm.nih.gov/pubmed/23105928
http://dx.doi.org/10.5808/GI.2012.10.1.44
_version_ 1782246954787405824
author Rashid, Md. Mamunur
Karim, Md. Rezaul
Jeong, Byeong-Soo
Choi, Ho-Jin
author_facet Rashid, Md. Mamunur
Karim, Md. Rezaul
Jeong, Byeong-Soo
Choi, Ho-Jin
author_sort Rashid, Md. Mamunur
collection PubMed
description Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time.
format Online
Article
Text
id pubmed-3475482
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Korea Genome Organization
record_format MEDLINE/PubMed
spelling pubmed-34754822012-10-26 Efficient Mining of Interesting Patterns in Large Biological Sequences Rashid, Md. Mamunur Karim, Md. Rezaul Jeong, Byeong-Soo Choi, Ho-Jin Genomics Inf Article Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time. Korea Genome Organization 2012-03 2012-03-31 /pmc/articles/PMC3475482/ /pubmed/23105928 http://dx.doi.org/10.5808/GI.2012.10.1.44 Text en Copyright © 2012 by The Korea Genome Organization http://creativecommons.org/licenses/by-nc/3.0 It is identical to the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/).
spellingShingle Article
Rashid, Md. Mamunur
Karim, Md. Rezaul
Jeong, Byeong-Soo
Choi, Ho-Jin
Efficient Mining of Interesting Patterns in Large Biological Sequences
title Efficient Mining of Interesting Patterns in Large Biological Sequences
title_full Efficient Mining of Interesting Patterns in Large Biological Sequences
title_fullStr Efficient Mining of Interesting Patterns in Large Biological Sequences
title_full_unstemmed Efficient Mining of Interesting Patterns in Large Biological Sequences
title_short Efficient Mining of Interesting Patterns in Large Biological Sequences
title_sort efficient mining of interesting patterns in large biological sequences
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3475482/
https://www.ncbi.nlm.nih.gov/pubmed/23105928
http://dx.doi.org/10.5808/GI.2012.10.1.44
work_keys_str_mv AT rashidmdmamunur efficientminingofinterestingpatternsinlargebiologicalsequences
AT karimmdrezaul efficientminingofinterestingpatternsinlargebiologicalsequences
AT jeongbyeongsoo efficientminingofinterestingpatternsinlargebiologicalsequences
AT choihojin efficientminingofinterestingpatternsinlargebiologicalsequences