Cargando…

Local alignment of generalized k-base encoded DNA sequence

BACKGROUND: DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base individually. The encoded DNA sequence may contain tech...

Descripción completa

Detalles Bibliográficos
Autores principales: Homer, Nils, Nelson, Stanley F, Merriman, Barry
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2911458/
https://www.ncbi.nlm.nih.gov/pubmed/20576157
http://dx.doi.org/10.1186/1471-2105-11-347
_version_ 1782184469716795392
author Homer, Nils
Nelson, Stanley F
Merriman, Barry
author_facet Homer, Nils
Nelson, Stanley F
Merriman, Barry
author_sort Homer, Nils
collection PubMed
description BACKGROUND: DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base individually. The encoded DNA sequence may contain technical errors, and therefore encoded sequencing errors must be incorporated when comparing an encoded DNA sequence to a reference DNA sequence. RESULTS: Although two-base encoding is currently used in practice, many other encoding schemes are possible, whereby two ore more bases are encoded at a time. A generalized k-base encoding scheme is presented, whereby feasible higher order encodings are better able to differentiate errors in the encoded sequence from true DNA sequence variants. A generalized version of the previous two-base encoding DNA sequence comparison algorithm is used to compare a k-base encoded sequence to a DNA reference sequence. Finally, simulations are performed to evaluate the power, the false positive and false negative SNP discovery rates, and the performance time of k-base encoding compared to previous methods as well as to the standard DNA sequence comparison algorithm. CONCLUSIONS: The novel generalized k-base encoding scheme and resulting local alignment algorithm permits the development of higher fidelity ligation-based next generation sequencing technology. This bioinformatic solution affords greater robustness to errors, as well as lower false SNP discovery rates, only at the cost of computational time.
format Text
id pubmed-2911458
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29114582010-07-29 Local alignment of generalized k-base encoded DNA sequence Homer, Nils Nelson, Stanley F Merriman, Barry BMC Bioinformatics Research Article BACKGROUND: DNA sequence comparison is a well-studied problem, in which two DNA sequences are compared using a weighted edit distance. Recent DNA sequencing technologies however observe an encoded form of the sequence, rather than each DNA base individually. The encoded DNA sequence may contain technical errors, and therefore encoded sequencing errors must be incorporated when comparing an encoded DNA sequence to a reference DNA sequence. RESULTS: Although two-base encoding is currently used in practice, many other encoding schemes are possible, whereby two ore more bases are encoded at a time. A generalized k-base encoding scheme is presented, whereby feasible higher order encodings are better able to differentiate errors in the encoded sequence from true DNA sequence variants. A generalized version of the previous two-base encoding DNA sequence comparison algorithm is used to compare a k-base encoded sequence to a DNA reference sequence. Finally, simulations are performed to evaluate the power, the false positive and false negative SNP discovery rates, and the performance time of k-base encoding compared to previous methods as well as to the standard DNA sequence comparison algorithm. CONCLUSIONS: The novel generalized k-base encoding scheme and resulting local alignment algorithm permits the development of higher fidelity ligation-based next generation sequencing technology. This bioinformatic solution affords greater robustness to errors, as well as lower false SNP discovery rates, only at the cost of computational time. BioMed Central 2010-06-24 /pmc/articles/PMC2911458/ /pubmed/20576157 http://dx.doi.org/10.1186/1471-2105-11-347 Text en Copyright ©2010 Homer et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Homer, Nils
Nelson, Stanley F
Merriman, Barry
Local alignment of generalized k-base encoded DNA sequence
title Local alignment of generalized k-base encoded DNA sequence
title_full Local alignment of generalized k-base encoded DNA sequence
title_fullStr Local alignment of generalized k-base encoded DNA sequence
title_full_unstemmed Local alignment of generalized k-base encoded DNA sequence
title_short Local alignment of generalized k-base encoded DNA sequence
title_sort local alignment of generalized k-base encoded dna sequence
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2911458/
https://www.ncbi.nlm.nih.gov/pubmed/20576157
http://dx.doi.org/10.1186/1471-2105-11-347
work_keys_str_mv AT homernils localalignmentofgeneralizedkbaseencodeddnasequence
AT nelsonstanleyf localalignmentofgeneralizedkbaseencodeddnasequence
AT merrimanbarry localalignmentofgeneralizedkbaseencodeddnasequence