Cargando…

Choosing the best heuristic for seeded alignment of DNA sequences

BACKGROUND: Seeded alignment is an important component of algorithms for fast, large-scale DNA similarity search. A good seed matching heuristic can reduce the execution time of genomic-scale sequence comparison without degrading sensitivity. Recently, many types of seed have been proposed to improv...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Yanni, Buhler, Jeremy
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1468433/
https://www.ncbi.nlm.nih.gov/pubmed/16533404
http://dx.doi.org/10.1186/1471-2105-7-133
_version_ 1782127563919851520
author Sun, Yanni
Buhler, Jeremy
author_facet Sun, Yanni
Buhler, Jeremy
author_sort Sun, Yanni
collection PubMed
description BACKGROUND: Seeded alignment is an important component of algorithms for fast, large-scale DNA similarity search. A good seed matching heuristic can reduce the execution time of genomic-scale sequence comparison without degrading sensitivity. Recently, many types of seed have been proposed to improve on the performance of traditional contiguous seeds as used in, e.g., NCBI BLASTN. Choosing among these seed types, particularly those that use information besides the presence or absence of matching residue pairs, requires practical guidance based on a rigorous comparison, including assessment of sensitivity, specificity, and computational efficiency. This work performs such a comparison, focusing on alignments in DNA outside widely studied coding regions. RESULTS: We compare seeds of several types, including those allowing transition mutations rather than matches at fixed positions, those allowing transitions at arbitrary positions ("BLASTZ" seeds), and those using a more general scoring matrix. For each seed type, we use an extended version of our Mandala seed design software to choose seeds with optimized sensitivity for various levels of specificity. Our results show that, on a test set biased toward alignments of noncoding DNA, transition information significantly improves seed performance, while finer distinctions between different types of mismatches do not. BLASTZ seeds perform especially well. These results depend on properties of our test set that are not shared by EST-based test sets with a strong bias toward coding DNA. CONCLUSION: Practical seed design requires careful attention to the properties of the alignments being sought. For noncoding DNA sequences, seeds that use transition information, especially BLASTZ-style seeds, are particularly useful. The Mandala seed design software can be found at .
format Text
id pubmed-1468433
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-14684332006-06-07 Choosing the best heuristic for seeded alignment of DNA sequences Sun, Yanni Buhler, Jeremy BMC Bioinformatics Methodology Article BACKGROUND: Seeded alignment is an important component of algorithms for fast, large-scale DNA similarity search. A good seed matching heuristic can reduce the execution time of genomic-scale sequence comparison without degrading sensitivity. Recently, many types of seed have been proposed to improve on the performance of traditional contiguous seeds as used in, e.g., NCBI BLASTN. Choosing among these seed types, particularly those that use information besides the presence or absence of matching residue pairs, requires practical guidance based on a rigorous comparison, including assessment of sensitivity, specificity, and computational efficiency. This work performs such a comparison, focusing on alignments in DNA outside widely studied coding regions. RESULTS: We compare seeds of several types, including those allowing transition mutations rather than matches at fixed positions, those allowing transitions at arbitrary positions ("BLASTZ" seeds), and those using a more general scoring matrix. For each seed type, we use an extended version of our Mandala seed design software to choose seeds with optimized sensitivity for various levels of specificity. Our results show that, on a test set biased toward alignments of noncoding DNA, transition information significantly improves seed performance, while finer distinctions between different types of mismatches do not. BLASTZ seeds perform especially well. These results depend on properties of our test set that are not shared by EST-based test sets with a strong bias toward coding DNA. CONCLUSION: Practical seed design requires careful attention to the properties of the alignments being sought. For noncoding DNA sequences, seeds that use transition information, especially BLASTZ-style seeds, are particularly useful. The Mandala seed design software can be found at . BioMed Central 2006-03-13 /pmc/articles/PMC1468433/ /pubmed/16533404 http://dx.doi.org/10.1186/1471-2105-7-133 Text en Copyright © 2006 Sun and Buhler; licensee BioMed Central Ltd.
spellingShingle Methodology Article
Sun, Yanni
Buhler, Jeremy
Choosing the best heuristic for seeded alignment of DNA sequences
title Choosing the best heuristic for seeded alignment of DNA sequences
title_full Choosing the best heuristic for seeded alignment of DNA sequences
title_fullStr Choosing the best heuristic for seeded alignment of DNA sequences
title_full_unstemmed Choosing the best heuristic for seeded alignment of DNA sequences
title_short Choosing the best heuristic for seeded alignment of DNA sequences
title_sort choosing the best heuristic for seeded alignment of dna sequences
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1468433/
https://www.ncbi.nlm.nih.gov/pubmed/16533404
http://dx.doi.org/10.1186/1471-2105-7-133
work_keys_str_mv AT sunyanni choosingthebestheuristicforseededalignmentofdnasequences
AT buhlerjeremy choosingthebestheuristicforseededalignmentofdnasequences