Cargando…

Levenshtein Distance, Sequence Comparison and Biological Database Search

Levenshtein edit distance has played a central role—both past and present—in sequence alignment in particular and biological database similarity search in general. We start our review with a history of dynamic programming algorithms for computing Levenshtein distance and sequence alignments. Followi...

Descripción completa

Detalles Bibliográficos
Autores principales: Berger, Bonnie, Waterman, Michael S., Yu, Yun William
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8274556/
https://www.ncbi.nlm.nih.gov/pubmed/34257466
http://dx.doi.org/10.1109/tit.2020.2996543
Descripción
Sumario:Levenshtein edit distance has played a central role—both past and present—in sequence alignment in particular and biological database similarity search in general. We start our review with a history of dynamic programming algorithms for computing Levenshtein distance and sequence alignments. Following, we describe how those algorithms led to heuristics employed in the most widely used software in bioinformatics, BLAST, a program to search DNA and protein databases for evolutionarily relevant similarities. More recently, the advent of modern genomic sequencing and the volume of data it generates has resulted in a return to the problem of local alignment. We conclude with how the mathematical formulation of Levenshtein distance as a metric made possible additional optimizations to similarity search in biological contexts. These modern optimizations are built around the low metric entropy and fractional dimensionality of biological databases, enabling orders of magnitude acceleration of biological similarity search.