Cargando…

Levenshtein Distance, Sequence Comparison and Biological Database Search

Levenshtein edit distance has played a central role—both past and present—in sequence alignment in particular and biological database similarity search in general. We start our review with a history of dynamic programming algorithms for computing Levenshtein distance and sequence alignments. Followi...

Descripción completa

Detalles Bibliográficos
Autores principales: Berger, Bonnie, Waterman, Michael S., Yu, Yun William
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8274556/
https://www.ncbi.nlm.nih.gov/pubmed/34257466
http://dx.doi.org/10.1109/tit.2020.2996543
_version_ 1783721571640147968
author Berger, Bonnie
Waterman, Michael S.
Yu, Yun William
author_facet Berger, Bonnie
Waterman, Michael S.
Yu, Yun William
author_sort Berger, Bonnie
collection PubMed
description Levenshtein edit distance has played a central role—both past and present—in sequence alignment in particular and biological database similarity search in general. We start our review with a history of dynamic programming algorithms for computing Levenshtein distance and sequence alignments. Following, we describe how those algorithms led to heuristics employed in the most widely used software in bioinformatics, BLAST, a program to search DNA and protein databases for evolutionarily relevant similarities. More recently, the advent of modern genomic sequencing and the volume of data it generates has resulted in a return to the problem of local alignment. We conclude with how the mathematical formulation of Levenshtein distance as a metric made possible additional optimizations to similarity search in biological contexts. These modern optimizations are built around the low metric entropy and fractional dimensionality of biological databases, enabling orders of magnitude acceleration of biological similarity search.
format Online
Article
Text
id pubmed-8274556
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-82745562021-07-12 Levenshtein Distance, Sequence Comparison and Biological Database Search Berger, Bonnie Waterman, Michael S. Yu, Yun William IEEE Trans Inf Theory Article Levenshtein edit distance has played a central role—both past and present—in sequence alignment in particular and biological database similarity search in general. We start our review with a history of dynamic programming algorithms for computing Levenshtein distance and sequence alignments. Following, we describe how those algorithms led to heuristics employed in the most widely used software in bioinformatics, BLAST, a program to search DNA and protein databases for evolutionarily relevant similarities. More recently, the advent of modern genomic sequencing and the volume of data it generates has resulted in a return to the problem of local alignment. We conclude with how the mathematical formulation of Levenshtein distance as a metric made possible additional optimizations to similarity search in biological contexts. These modern optimizations are built around the low metric entropy and fractional dimensionality of biological databases, enabling orders of magnitude acceleration of biological similarity search. 2020-05-21 2021-06 /pmc/articles/PMC8274556/ /pubmed/34257466 http://dx.doi.org/10.1109/tit.2020.2996543 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Berger, Bonnie
Waterman, Michael S.
Yu, Yun William
Levenshtein Distance, Sequence Comparison and Biological Database Search
title Levenshtein Distance, Sequence Comparison and Biological Database Search
title_full Levenshtein Distance, Sequence Comparison and Biological Database Search
title_fullStr Levenshtein Distance, Sequence Comparison and Biological Database Search
title_full_unstemmed Levenshtein Distance, Sequence Comparison and Biological Database Search
title_short Levenshtein Distance, Sequence Comparison and Biological Database Search
title_sort levenshtein distance, sequence comparison and biological database search
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8274556/
https://www.ncbi.nlm.nih.gov/pubmed/34257466
http://dx.doi.org/10.1109/tit.2020.2996543
work_keys_str_mv AT bergerbonnie levenshteindistancesequencecomparisonandbiologicaldatabasesearch
AT watermanmichaels levenshteindistancesequencecomparisonandbiologicaldatabasesearch
AT yuyunwilliam levenshteindistancesequencecomparisonandbiologicaldatabasesearch