Cargando…

Hit integration for identifying optimal spaced seeds

BACKGROUND: Introduction of spaced speeds opened a way of sensitivity improvement in homology search without loss of search speed. Since then, the efforts of finding optimal seed which maximizes the sensitivity have been continued today. The sensitivity of a seed is generally computed by its hit pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Chung, Won-Hyoung, Park, Seong-Bae
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009509/
https://www.ncbi.nlm.nih.gov/pubmed/20122210
http://dx.doi.org/10.1186/1471-2105-11-S1-S37
_version_ 1782194695032537088
author Chung, Won-Hyoung
Park, Seong-Bae
author_facet Chung, Won-Hyoung
Park, Seong-Bae
author_sort Chung, Won-Hyoung
collection PubMed
description BACKGROUND: Introduction of spaced speeds opened a way of sensitivity improvement in homology search without loss of search speed. Since then, the efforts of finding optimal seed which maximizes the sensitivity have been continued today. The sensitivity of a seed is generally computed by its hit probability. However, the limitation of hit probability is that it computes the sensitivity only at a specific similarity level while homologous regions usually distributed in various similarity levels. As a result, the optimal seed found by hit probability is not actually optimal for various similarity levels. Therefore, a new measure of seed sensitivity is required to recommend seeds that are robust to various similarity levels. RESULTS: We propose a new probability model of sensitivity hit integration which covers a range of similarity levels of homologous regions. A novel algorithm of computing hit integration is proposed which is based on integration of hit probabilities at a range of similarity levels. We also prove that hit integration is computable by expressing the integral part of hit integration as a recursive formula which can be easily solved by dynamic programming. The experimental results for biological data show that hit integration reveals the seeds more optimal than those by PatternHunter. CONCLUSION: The presented model is a more general model to estimate sensitivity than hit probability by relaxing similarity level. We propose a novel algorithm which directly computes the sensitivity at a range of similarity levels.
format Text
id pubmed-3009509
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30095092010-12-23 Hit integration for identifying optimal spaced seeds Chung, Won-Hyoung Park, Seong-Bae BMC Bioinformatics Research BACKGROUND: Introduction of spaced speeds opened a way of sensitivity improvement in homology search without loss of search speed. Since then, the efforts of finding optimal seed which maximizes the sensitivity have been continued today. The sensitivity of a seed is generally computed by its hit probability. However, the limitation of hit probability is that it computes the sensitivity only at a specific similarity level while homologous regions usually distributed in various similarity levels. As a result, the optimal seed found by hit probability is not actually optimal for various similarity levels. Therefore, a new measure of seed sensitivity is required to recommend seeds that are robust to various similarity levels. RESULTS: We propose a new probability model of sensitivity hit integration which covers a range of similarity levels of homologous regions. A novel algorithm of computing hit integration is proposed which is based on integration of hit probabilities at a range of similarity levels. We also prove that hit integration is computable by expressing the integral part of hit integration as a recursive formula which can be easily solved by dynamic programming. The experimental results for biological data show that hit integration reveals the seeds more optimal than those by PatternHunter. CONCLUSION: The presented model is a more general model to estimate sensitivity than hit probability by relaxing similarity level. We propose a novel algorithm which directly computes the sensitivity at a range of similarity levels. BioMed Central 2010-01-18 /pmc/articles/PMC3009509/ /pubmed/20122210 http://dx.doi.org/10.1186/1471-2105-11-S1-S37 Text en Copyright ©2010 Chung and Park; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Chung, Won-Hyoung
Park, Seong-Bae
Hit integration for identifying optimal spaced seeds
title Hit integration for identifying optimal spaced seeds
title_full Hit integration for identifying optimal spaced seeds
title_fullStr Hit integration for identifying optimal spaced seeds
title_full_unstemmed Hit integration for identifying optimal spaced seeds
title_short Hit integration for identifying optimal spaced seeds
title_sort hit integration for identifying optimal spaced seeds
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009509/
https://www.ncbi.nlm.nih.gov/pubmed/20122210
http://dx.doi.org/10.1186/1471-2105-11-S1-S37
work_keys_str_mv AT chungwonhyoung hitintegrationforidentifyingoptimalspacedseeds
AT parkseongbae hitintegrationforidentifyingoptimalspacedseeds