Cargando…
Choosing the best heuristic for seeded alignment of DNA sequences
BACKGROUND: Seeded alignment is an important component of algorithms for fast, large-scale DNA similarity search. A good seed matching heuristic can reduce the execution time of genomic-scale sequence comparison without degrading sensitivity. Recently, many types of seed have been proposed to improv...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1468433/ https://www.ncbi.nlm.nih.gov/pubmed/16533404 http://dx.doi.org/10.1186/1471-2105-7-133 |
_version_ | 1782127563919851520 |
---|---|
author | Sun, Yanni Buhler, Jeremy |
author_facet | Sun, Yanni Buhler, Jeremy |
author_sort | Sun, Yanni |
collection | PubMed |
description | BACKGROUND: Seeded alignment is an important component of algorithms for fast, large-scale DNA similarity search. A good seed matching heuristic can reduce the execution time of genomic-scale sequence comparison without degrading sensitivity. Recently, many types of seed have been proposed to improve on the performance of traditional contiguous seeds as used in, e.g., NCBI BLASTN. Choosing among these seed types, particularly those that use information besides the presence or absence of matching residue pairs, requires practical guidance based on a rigorous comparison, including assessment of sensitivity, specificity, and computational efficiency. This work performs such a comparison, focusing on alignments in DNA outside widely studied coding regions. RESULTS: We compare seeds of several types, including those allowing transition mutations rather than matches at fixed positions, those allowing transitions at arbitrary positions ("BLASTZ" seeds), and those using a more general scoring matrix. For each seed type, we use an extended version of our Mandala seed design software to choose seeds with optimized sensitivity for various levels of specificity. Our results show that, on a test set biased toward alignments of noncoding DNA, transition information significantly improves seed performance, while finer distinctions between different types of mismatches do not. BLASTZ seeds perform especially well. These results depend on properties of our test set that are not shared by EST-based test sets with a strong bias toward coding DNA. CONCLUSION: Practical seed design requires careful attention to the properties of the alignments being sought. For noncoding DNA sequences, seeds that use transition information, especially BLASTZ-style seeds, are particularly useful. The Mandala seed design software can be found at . |
format | Text |
id | pubmed-1468433 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-14684332006-06-07 Choosing the best heuristic for seeded alignment of DNA sequences Sun, Yanni Buhler, Jeremy BMC Bioinformatics Methodology Article BACKGROUND: Seeded alignment is an important component of algorithms for fast, large-scale DNA similarity search. A good seed matching heuristic can reduce the execution time of genomic-scale sequence comparison without degrading sensitivity. Recently, many types of seed have been proposed to improve on the performance of traditional contiguous seeds as used in, e.g., NCBI BLASTN. Choosing among these seed types, particularly those that use information besides the presence or absence of matching residue pairs, requires practical guidance based on a rigorous comparison, including assessment of sensitivity, specificity, and computational efficiency. This work performs such a comparison, focusing on alignments in DNA outside widely studied coding regions. RESULTS: We compare seeds of several types, including those allowing transition mutations rather than matches at fixed positions, those allowing transitions at arbitrary positions ("BLASTZ" seeds), and those using a more general scoring matrix. For each seed type, we use an extended version of our Mandala seed design software to choose seeds with optimized sensitivity for various levels of specificity. Our results show that, on a test set biased toward alignments of noncoding DNA, transition information significantly improves seed performance, while finer distinctions between different types of mismatches do not. BLASTZ seeds perform especially well. These results depend on properties of our test set that are not shared by EST-based test sets with a strong bias toward coding DNA. CONCLUSION: Practical seed design requires careful attention to the properties of the alignments being sought. For noncoding DNA sequences, seeds that use transition information, especially BLASTZ-style seeds, are particularly useful. The Mandala seed design software can be found at . BioMed Central 2006-03-13 /pmc/articles/PMC1468433/ /pubmed/16533404 http://dx.doi.org/10.1186/1471-2105-7-133 Text en Copyright © 2006 Sun and Buhler; licensee BioMed Central Ltd. |
spellingShingle | Methodology Article Sun, Yanni Buhler, Jeremy Choosing the best heuristic for seeded alignment of DNA sequences |
title | Choosing the best heuristic for seeded alignment of DNA sequences |
title_full | Choosing the best heuristic for seeded alignment of DNA sequences |
title_fullStr | Choosing the best heuristic for seeded alignment of DNA sequences |
title_full_unstemmed | Choosing the best heuristic for seeded alignment of DNA sequences |
title_short | Choosing the best heuristic for seeded alignment of DNA sequences |
title_sort | choosing the best heuristic for seeded alignment of dna sequences |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1468433/ https://www.ncbi.nlm.nih.gov/pubmed/16533404 http://dx.doi.org/10.1186/1471-2105-7-133 |
work_keys_str_mv | AT sunyanni choosingthebestheuristicforseededalignmentofdnasequences AT buhlerjeremy choosingthebestheuristicforseededalignmentofdnasequences |