Cargando…

MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure

A significant approach for the discovery of biological regulatory rules of genes, protein and their inheritance relationships is the extraction of meaningful patterns from biological sequence data. The existing algorithms of sequence pattern discovery, like MSPM and FBSB, suffice their low efficienc...

Descripción completa

Detalles Bibliográficos
Autores principales:	Li, Weina, Ren, Jiadong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5912758/ https://www.ncbi.nlm.nih.gov/pubmed/29684052 http://dx.doi.org/10.1371/journal.pone.0195601

_version_	1783316418851241984
author	Li, Weina Ren, Jiadong
author_facet	Li, Weina Ren, Jiadong
author_sort	Li, Weina
collection	PubMed
description	A significant approach for the discovery of biological regulatory rules of genes, protein and their inheritance relationships is the extraction of meaningful patterns from biological sequence data. The existing algorithms of sequence pattern discovery, like MSPM and FBSB, suffice their low efficiency and accuracy. In order to deal with this issue, this paper presents a new algorithm for biological sequence pattern mining abbreviated MpBsmi based on the data index structure. The MpBsmi algorithm employs a sequence position table abbreviated ST and a sequence database index structure named DB-Index for data storing, mining and pattern expansion. The ST and DB-Index of single items are firstly obtained through scanning sequence database once. Then a new algorithm for fast support counting is developed to mine the table ST to identify the frequent single items. Based on a connection strategy, the frequent patterns are expanded and the expanded table ST is updated by scanning the DB-Index. The fast support counting algorithm is used for obtaining the frequent expansion patterns. Finally, a new pruning technique is developed for extended pattern to avoid the generation of unnecessarily large number of candidate patterns. The experiments results on multiple classical protein sequences from the Pfam database validate the performance of the proposed algorithm including the accuracy, stability and scalability. It is showed that the proposed algorithm has achieved the better space efficiency, stability and scalability comparing with MSPM, FBSB which are the two main algorithms for biological sequence mining.
format	Online Article Text
id	pubmed-5912758
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-59127582018-05-05 MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure Li, Weina Ren, Jiadong PLoS One Research Article A significant approach for the discovery of biological regulatory rules of genes, protein and their inheritance relationships is the extraction of meaningful patterns from biological sequence data. The existing algorithms of sequence pattern discovery, like MSPM and FBSB, suffice their low efficiency and accuracy. In order to deal with this issue, this paper presents a new algorithm for biological sequence pattern mining abbreviated MpBsmi based on the data index structure. The MpBsmi algorithm employs a sequence position table abbreviated ST and a sequence database index structure named DB-Index for data storing, mining and pattern expansion. The ST and DB-Index of single items are firstly obtained through scanning sequence database once. Then a new algorithm for fast support counting is developed to mine the table ST to identify the frequent single items. Based on a connection strategy, the frequent patterns are expanded and the expanded table ST is updated by scanning the DB-Index. The fast support counting algorithm is used for obtaining the frequent expansion patterns. Finally, a new pruning technique is developed for extended pattern to avoid the generation of unnecessarily large number of candidate patterns. The experiments results on multiple classical protein sequences from the Pfam database validate the performance of the proposed algorithm including the accuracy, stability and scalability. It is showed that the proposed algorithm has achieved the better space efficiency, stability and scalability comparing with MSPM, FBSB which are the two main algorithms for biological sequence mining. Public Library of Science 2018-04-23 /pmc/articles/PMC5912758/ /pubmed/29684052 http://dx.doi.org/10.1371/journal.pone.0195601 Text en © 2018 Li, Ren http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Li, Weina Ren, Jiadong MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure
title	MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure
title_full	MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure
title_fullStr	MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure
title_full_unstemmed	MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure
title_short	MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure
title_sort	mpbsmi: a new algorithm for the recognition of continuous biological sequence pattern based on index structure
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5912758/ https://www.ncbi.nlm.nih.gov/pubmed/29684052 http://dx.doi.org/10.1371/journal.pone.0195601
work_keys_str_mv	AT liweina mpbsmianewalgorithmfortherecognitionofcontinuousbiologicalsequencepatternbasedonindexstructure AT renjiadong mpbsmianewalgorithmfortherecognitionofcontinuousbiologicalsequencepatternbasedonindexstructure

MpBsmi: A new algorithm for the recognition of continuous biological sequence pattern based on index structure

Ejemplares similares