Cargando…
Levenshtein Distance, Sequence Comparison and Biological Database Search
Levenshtein edit distance has played a central role—both past and present—in sequence alignment in particular and biological database similarity search in general. We start our review with a history of dynamic programming algorithms for computing Levenshtein distance and sequence alignments. Followi...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8274556/ https://www.ncbi.nlm.nih.gov/pubmed/34257466 http://dx.doi.org/10.1109/tit.2020.2996543 |
_version_ | 1783721571640147968 |
---|---|
author | Berger, Bonnie Waterman, Michael S. Yu, Yun William |
author_facet | Berger, Bonnie Waterman, Michael S. Yu, Yun William |
author_sort | Berger, Bonnie |
collection | PubMed |
description | Levenshtein edit distance has played a central role—both past and present—in sequence alignment in particular and biological database similarity search in general. We start our review with a history of dynamic programming algorithms for computing Levenshtein distance and sequence alignments. Following, we describe how those algorithms led to heuristics employed in the most widely used software in bioinformatics, BLAST, a program to search DNA and protein databases for evolutionarily relevant similarities. More recently, the advent of modern genomic sequencing and the volume of data it generates has resulted in a return to the problem of local alignment. We conclude with how the mathematical formulation of Levenshtein distance as a metric made possible additional optimizations to similarity search in biological contexts. These modern optimizations are built around the low metric entropy and fractional dimensionality of biological databases, enabling orders of magnitude acceleration of biological similarity search. |
format | Online Article Text |
id | pubmed-8274556 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-82745562021-07-12 Levenshtein Distance, Sequence Comparison and Biological Database Search Berger, Bonnie Waterman, Michael S. Yu, Yun William IEEE Trans Inf Theory Article Levenshtein edit distance has played a central role—both past and present—in sequence alignment in particular and biological database similarity search in general. We start our review with a history of dynamic programming algorithms for computing Levenshtein distance and sequence alignments. Following, we describe how those algorithms led to heuristics employed in the most widely used software in bioinformatics, BLAST, a program to search DNA and protein databases for evolutionarily relevant similarities. More recently, the advent of modern genomic sequencing and the volume of data it generates has resulted in a return to the problem of local alignment. We conclude with how the mathematical formulation of Levenshtein distance as a metric made possible additional optimizations to similarity search in biological contexts. These modern optimizations are built around the low metric entropy and fractional dimensionality of biological databases, enabling orders of magnitude acceleration of biological similarity search. 2020-05-21 2021-06 /pmc/articles/PMC8274556/ /pubmed/34257466 http://dx.doi.org/10.1109/tit.2020.2996543 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Berger, Bonnie Waterman, Michael S. Yu, Yun William Levenshtein Distance, Sequence Comparison and Biological Database Search |
title | Levenshtein Distance, Sequence Comparison and Biological Database Search |
title_full | Levenshtein Distance, Sequence Comparison and Biological Database Search |
title_fullStr | Levenshtein Distance, Sequence Comparison and Biological Database Search |
title_full_unstemmed | Levenshtein Distance, Sequence Comparison and Biological Database Search |
title_short | Levenshtein Distance, Sequence Comparison and Biological Database Search |
title_sort | levenshtein distance, sequence comparison and biological database search |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8274556/ https://www.ncbi.nlm.nih.gov/pubmed/34257466 http://dx.doi.org/10.1109/tit.2020.2996543 |
work_keys_str_mv | AT bergerbonnie levenshteindistancesequencecomparisonandbiologicaldatabasesearch AT watermanmichaels levenshteindistancesequencecomparisonandbiologicaldatabasesearch AT yuyunwilliam levenshteindistancesequencecomparisonandbiologicaldatabasesearch |