Cargando…

Highly improved homopolymer aware nucleotide-protein alignments with 454 data

BACKGROUND: Roche 454 sequencing is the leading sequencing technology for producing long read high throughput sequence data. Unlike most methods where sequencing errors translate to base uncertainties, 454 sequencing inaccuracies create nucleotide gaps. These gaps are particularly troublesome for tr...

Descripción completa

Detalles Bibliográficos
Autor principal: Lysholm, Fredrik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3568017/
https://www.ncbi.nlm.nih.gov/pubmed/22971057
http://dx.doi.org/10.1186/1471-2105-13-230
_version_ 1782258750198906880
author Lysholm, Fredrik
author_facet Lysholm, Fredrik
author_sort Lysholm, Fredrik
collection PubMed
description BACKGROUND: Roche 454 sequencing is the leading sequencing technology for producing long read high throughput sequence data. Unlike most methods where sequencing errors translate to base uncertainties, 454 sequencing inaccuracies create nucleotide gaps. These gaps are particularly troublesome for translated search tools such as BLASTx where they introduce frame-shifts and result in regions of decreased identity and/or terminated alignments, which affect further analysis. RESULTS: To address this issue, the Homopolymer Aware Cross Alignment Tool (HAXAT) was developed. HAXAT uses a novel dynamic programming algorithm for solving the optimal local alignment between a 454 nucleotide and a protein sequence by allowing frame-shifts, guided by 454 flowpeak values. The algorithm is an efficient minimal extension of the Smith-Waterman-Gotoh algorithm that easily fits in into other tools. Experiments using HAXAT demonstrate, through the introduction of 454 specific frame-shift penalties, significantly increased accuracy of alignments spanning homopolymer sequence errors. The full effect of the new parameters introduced with this novel alignment model is explored. Experimental results evaluating homopolymer inaccuracy through alignments show a two to five-fold increase in Matthews Correlation Coefficient over previous algorithms, for 454-derived data. CONCLUSIONS: This increased accuracy provided by HAXAT does not only result in improved homologue estimations, but also provides un-interrupted reading-frames, which greatly facilitate further analysis of protein space, for example phylogenetic analysis. The alignment tool is available at http://bioinfo.ifm.liu.se/454tools/haxat.
format Online
Article
Text
id pubmed-3568017
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35680172013-02-13 Highly improved homopolymer aware nucleotide-protein alignments with 454 data Lysholm, Fredrik BMC Bioinformatics Research Article BACKGROUND: Roche 454 sequencing is the leading sequencing technology for producing long read high throughput sequence data. Unlike most methods where sequencing errors translate to base uncertainties, 454 sequencing inaccuracies create nucleotide gaps. These gaps are particularly troublesome for translated search tools such as BLASTx where they introduce frame-shifts and result in regions of decreased identity and/or terminated alignments, which affect further analysis. RESULTS: To address this issue, the Homopolymer Aware Cross Alignment Tool (HAXAT) was developed. HAXAT uses a novel dynamic programming algorithm for solving the optimal local alignment between a 454 nucleotide and a protein sequence by allowing frame-shifts, guided by 454 flowpeak values. The algorithm is an efficient minimal extension of the Smith-Waterman-Gotoh algorithm that easily fits in into other tools. Experiments using HAXAT demonstrate, through the introduction of 454 specific frame-shift penalties, significantly increased accuracy of alignments spanning homopolymer sequence errors. The full effect of the new parameters introduced with this novel alignment model is explored. Experimental results evaluating homopolymer inaccuracy through alignments show a two to five-fold increase in Matthews Correlation Coefficient over previous algorithms, for 454-derived data. CONCLUSIONS: This increased accuracy provided by HAXAT does not only result in improved homologue estimations, but also provides un-interrupted reading-frames, which greatly facilitate further analysis of protein space, for example phylogenetic analysis. The alignment tool is available at http://bioinfo.ifm.liu.se/454tools/haxat. BioMed Central 2012-09-12 /pmc/articles/PMC3568017/ /pubmed/22971057 http://dx.doi.org/10.1186/1471-2105-13-230 Text en Copyright ©2012 Lysholm; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Lysholm, Fredrik
Highly improved homopolymer aware nucleotide-protein alignments with 454 data
title Highly improved homopolymer aware nucleotide-protein alignments with 454 data
title_full Highly improved homopolymer aware nucleotide-protein alignments with 454 data
title_fullStr Highly improved homopolymer aware nucleotide-protein alignments with 454 data
title_full_unstemmed Highly improved homopolymer aware nucleotide-protein alignments with 454 data
title_short Highly improved homopolymer aware nucleotide-protein alignments with 454 data
title_sort highly improved homopolymer aware nucleotide-protein alignments with 454 data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3568017/
https://www.ncbi.nlm.nih.gov/pubmed/22971057
http://dx.doi.org/10.1186/1471-2105-13-230
work_keys_str_mv AT lysholmfredrik highlyimprovedhomopolymerawarenucleotideproteinalignmentswith454data