Cargando…

Similarity evaluation of DNA sequences based on frequent patterns and entropy

BACKGROUND: DNA sequence analysis is an important research topic in bioinformatics. Evaluating the similarity between sequences, which is crucial for sequence analysis, has attracted much research effort in the last two decades, and a dozen of algorithms and tools have been developed. These methods...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xie, Xiaojing, Guan, Jihong, Zhou, Shuigeng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331808/ https://www.ncbi.nlm.nih.gov/pubmed/25707937 http://dx.doi.org/10.1186/1471-2164-16-S3-S5

_version_	1782357783801233408
author	Xie, Xiaojing Guan, Jihong Zhou, Shuigeng
author_facet	Xie, Xiaojing Guan, Jihong Zhou, Shuigeng
author_sort	Xie, Xiaojing
collection	PubMed
description	BACKGROUND: DNA sequence analysis is an important research topic in bioinformatics. Evaluating the similarity between sequences, which is crucial for sequence analysis, has attracted much research effort in the last two decades, and a dozen of algorithms and tools have been developed. These methods are based on alignment, word frequency and geometric representation respectively, each of which has its advantage and disadvantage. RESULTS: In this paper, for effectively computing the similarity between DNA sequences, we introduce a novel method based on frequency patterns and entropy to construct representative vectors of DNA sequences. Experiments are conducted to evaluate the proposed method, which is compared with two recently-developed alignment-free methods and the BLASTN tool. When testing on the β-globin genes of 11 species and using the results from MEGA as the baseline, our method achieves higher correlation coefficients than the two alignment-free methods and the BLASTN tool. CONCLUSIONS: Our method is not only able to capture fine-granularity information (location and ordering) of DNA sequences via sequence blocking, but also insensitive to noise and sequence rearrangement due to considering only the maximal frequent patterns. It outperforms major existing methods or tools.
format	Online Article Text
id	pubmed-4331808
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-43318082015-03-19 Similarity evaluation of DNA sequences based on frequent patterns and entropy Xie, Xiaojing Guan, Jihong Zhou, Shuigeng BMC Genomics Proceedings BACKGROUND: DNA sequence analysis is an important research topic in bioinformatics. Evaluating the similarity between sequences, which is crucial for sequence analysis, has attracted much research effort in the last two decades, and a dozen of algorithms and tools have been developed. These methods are based on alignment, word frequency and geometric representation respectively, each of which has its advantage and disadvantage. RESULTS: In this paper, for effectively computing the similarity between DNA sequences, we introduce a novel method based on frequency patterns and entropy to construct representative vectors of DNA sequences. Experiments are conducted to evaluate the proposed method, which is compared with two recently-developed alignment-free methods and the BLASTN tool. When testing on the β-globin genes of 11 species and using the results from MEGA as the baseline, our method achieves higher correlation coefficients than the two alignment-free methods and the BLASTN tool. CONCLUSIONS: Our method is not only able to capture fine-granularity information (location and ordering) of DNA sequences via sequence blocking, but also insensitive to noise and sequence rearrangement due to considering only the maximal frequent patterns. It outperforms major existing methods or tools. BioMed Central 2015-01-29 /pmc/articles/PMC4331808/ /pubmed/25707937 http://dx.doi.org/10.1186/1471-2164-16-S3-S5 Text en Copyright © 2015 Xie et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Proceedings Xie, Xiaojing Guan, Jihong Zhou, Shuigeng Similarity evaluation of DNA sequences based on frequent patterns and entropy
title	Similarity evaluation of DNA sequences based on frequent patterns and entropy
title_full	Similarity evaluation of DNA sequences based on frequent patterns and entropy
title_fullStr	Similarity evaluation of DNA sequences based on frequent patterns and entropy
title_full_unstemmed	Similarity evaluation of DNA sequences based on frequent patterns and entropy
title_short	Similarity evaluation of DNA sequences based on frequent patterns and entropy
title_sort	similarity evaluation of dna sequences based on frequent patterns and entropy
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331808/ https://www.ncbi.nlm.nih.gov/pubmed/25707937 http://dx.doi.org/10.1186/1471-2164-16-S3-S5
work_keys_str_mv	AT xiexiaojing similarityevaluationofdnasequencesbasedonfrequentpatternsandentropy AT guanjihong similarityevaluationofdnasequencesbasedonfrequentpatternsandentropy AT zhoushuigeng similarityevaluationofdnasequencesbasedonfrequentpatternsandentropy

Similarity evaluation of DNA sequences based on frequent patterns and entropy

Ejemplares similares