Cargando…

Logarithmic gap costs decrease alignment accuracy

BACKGROUND: Studies on the distribution of indel sizes have consistently found that they obey a power law. This finding has lead several scientists to propose that logarithmic gap costs, G (k) = a + c ln k, are more biologically realistic than affine gap costs, G (k) = a + bk, for sequence alignment...

Descripción completa

Detalles Bibliográficos
Autor principal: Cartwright, Reed A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1770940/
https://www.ncbi.nlm.nih.gov/pubmed/17147805
http://dx.doi.org/10.1186/1471-2105-7-527
_version_ 1782131719096238080
author Cartwright, Reed A
author_facet Cartwright, Reed A
author_sort Cartwright, Reed A
collection PubMed
description BACKGROUND: Studies on the distribution of indel sizes have consistently found that they obey a power law. This finding has lead several scientists to propose that logarithmic gap costs, G (k) = a + c ln k, are more biologically realistic than affine gap costs, G (k) = a + bk, for sequence alignment. Since quick and efficient affine costs are currently the most popular way to globally align sequences, the goal of this paper is to determine whether logarithmic gap costs improve alignment accuracy significantly enough the merit their use over the faster affine gap costs. RESULTS: A group of simulated sequences pairs were globally aligned using affine, logarithmic, and log-affine gap costs. Alignment accuracy was calculated by comparing resulting alignments to actual alignments of the sequence pairs. Gap costs were then compared based on average alignment accuracy. Log-affine gap costs had the best accuracy, followed closely by affine gap costs, while logarithmic gap costs performed poorly. Subsequently a model was developed to explain the results. CONCLUSION: In contrast to initial expectations, logarithmic gap costs produce poor alignments and are actually not implied by the power-law behavior of gap sizes, given typical match and mismatch costs. Furthermore, affine gap costs not only produce accurate alignments but are also good approximations to biologically realistic gap costs. This work provides added confidence for the biological relevance of existing alignment algorithms.
format Text
id pubmed-1770940
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-17709402007-01-22 Logarithmic gap costs decrease alignment accuracy Cartwright, Reed A BMC Bioinformatics Methodology Article BACKGROUND: Studies on the distribution of indel sizes have consistently found that they obey a power law. This finding has lead several scientists to propose that logarithmic gap costs, G (k) = a + c ln k, are more biologically realistic than affine gap costs, G (k) = a + bk, for sequence alignment. Since quick and efficient affine costs are currently the most popular way to globally align sequences, the goal of this paper is to determine whether logarithmic gap costs improve alignment accuracy significantly enough the merit their use over the faster affine gap costs. RESULTS: A group of simulated sequences pairs were globally aligned using affine, logarithmic, and log-affine gap costs. Alignment accuracy was calculated by comparing resulting alignments to actual alignments of the sequence pairs. Gap costs were then compared based on average alignment accuracy. Log-affine gap costs had the best accuracy, followed closely by affine gap costs, while logarithmic gap costs performed poorly. Subsequently a model was developed to explain the results. CONCLUSION: In contrast to initial expectations, logarithmic gap costs produce poor alignments and are actually not implied by the power-law behavior of gap sizes, given typical match and mismatch costs. Furthermore, affine gap costs not only produce accurate alignments but are also good approximations to biologically realistic gap costs. This work provides added confidence for the biological relevance of existing alignment algorithms. BioMed Central 2006-12-05 /pmc/articles/PMC1770940/ /pubmed/17147805 http://dx.doi.org/10.1186/1471-2105-7-527 Text en Copyright © 2006 Cartwright; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Cartwright, Reed A
Logarithmic gap costs decrease alignment accuracy
title Logarithmic gap costs decrease alignment accuracy
title_full Logarithmic gap costs decrease alignment accuracy
title_fullStr Logarithmic gap costs decrease alignment accuracy
title_full_unstemmed Logarithmic gap costs decrease alignment accuracy
title_short Logarithmic gap costs decrease alignment accuracy
title_sort logarithmic gap costs decrease alignment accuracy
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1770940/
https://www.ncbi.nlm.nih.gov/pubmed/17147805
http://dx.doi.org/10.1186/1471-2105-7-527
work_keys_str_mv AT cartwrightreeda logarithmicgapcostsdecreasealignmentaccuracy