Cargando…

Similarity Estimation Between DNA Sequences Based on Local Pattern Histograms of Binary Images

Graphical representation of DNA sequences is one of the most popular techniques for alignment-free sequence comparison. Here, we propose a new method for the feature extraction of DNA sequences represented by binary images, by estimating the similarity between DNA sequences using the frequency histo...

Descripción completa

Detalles Bibliográficos
Autores principales: Kobori, Yusei, Mizuta, Satoshi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4880953/
https://www.ncbi.nlm.nih.gov/pubmed/27132143
http://dx.doi.org/10.1016/j.gpb.2015.09.007
_version_ 1782433879978672128
author Kobori, Yusei
Mizuta, Satoshi
author_facet Kobori, Yusei
Mizuta, Satoshi
author_sort Kobori, Yusei
collection PubMed
description Graphical representation of DNA sequences is one of the most popular techniques for alignment-free sequence comparison. Here, we propose a new method for the feature extraction of DNA sequences represented by binary images, by estimating the similarity between DNA sequences using the frequency histograms of local bitmap patterns of images. Our method shows linear time complexity for the length of DNA sequences, which is practical even when long sequences, such as whole genome sequences, are compared. We tested five distance measures for the estimation of sequence similarities, and found that the histogram intersection and Manhattan distance are the most appropriate ones for phylogenetic analyses.
format Online
Article
Text
id pubmed-4880953
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-48809532016-06-06 Similarity Estimation Between DNA Sequences Based on Local Pattern Histograms of Binary Images Kobori, Yusei Mizuta, Satoshi Genomics Proteomics Bioinformatics Method Graphical representation of DNA sequences is one of the most popular techniques for alignment-free sequence comparison. Here, we propose a new method for the feature extraction of DNA sequences represented by binary images, by estimating the similarity between DNA sequences using the frequency histograms of local bitmap patterns of images. Our method shows linear time complexity for the length of DNA sequences, which is practical even when long sequences, such as whole genome sequences, are compared. We tested five distance measures for the estimation of sequence similarities, and found that the histogram intersection and Manhattan distance are the most appropriate ones for phylogenetic analyses. Elsevier 2016-04 2016-04-27 /pmc/articles/PMC4880953/ /pubmed/27132143 http://dx.doi.org/10.1016/j.gpb.2015.09.007 Text en © 2016 The Authors. Production and hosting by Elsevier B.V. on behalf of Beijing Institute of Genomics, Chinese Academy of Sciences and Genetics Society of China. http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Method
Kobori, Yusei
Mizuta, Satoshi
Similarity Estimation Between DNA Sequences Based on Local Pattern Histograms of Binary Images
title Similarity Estimation Between DNA Sequences Based on Local Pattern Histograms of Binary Images
title_full Similarity Estimation Between DNA Sequences Based on Local Pattern Histograms of Binary Images
title_fullStr Similarity Estimation Between DNA Sequences Based on Local Pattern Histograms of Binary Images
title_full_unstemmed Similarity Estimation Between DNA Sequences Based on Local Pattern Histograms of Binary Images
title_short Similarity Estimation Between DNA Sequences Based on Local Pattern Histograms of Binary Images
title_sort similarity estimation between dna sequences based on local pattern histograms of binary images
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4880953/
https://www.ncbi.nlm.nih.gov/pubmed/27132143
http://dx.doi.org/10.1016/j.gpb.2015.09.007
work_keys_str_mv AT koboriyusei similarityestimationbetweendnasequencesbasedonlocalpatternhistogramsofbinaryimages
AT mizutasatoshi similarityestimationbetweendnasequencesbasedonlocalpatternhistogramsofbinaryimages