Cargando…

iTriplet, a rule-based nucleic acid sequence motif finder

BACKGROUND: With the advent of high throughput sequencing techniques, large amounts of sequencing data are readily available for analysis. Natural biological signals are intrinsically highly variable making their complete identification a computationally challenging problem. Many attempts in using s...

Descripción completa

Detalles Bibliográficos
Autores principales: Ho, Eric S, Jakubowski, Christopher D, Gunderson, Samuel I
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2784457/
https://www.ncbi.nlm.nih.gov/pubmed/19874606
http://dx.doi.org/10.1186/1748-7188-4-14
_version_ 1782174749446635520
author Ho, Eric S
Jakubowski, Christopher D
Gunderson, Samuel I
author_facet Ho, Eric S
Jakubowski, Christopher D
Gunderson, Samuel I
author_sort Ho, Eric S
collection PubMed
description BACKGROUND: With the advent of high throughput sequencing techniques, large amounts of sequencing data are readily available for analysis. Natural biological signals are intrinsically highly variable making their complete identification a computationally challenging problem. Many attempts in using statistical or combinatorial approaches have been made with great success in the past. However, identifying highly degenerate and long (>20 nucleotides) motifs still remains an unmet challenge as high degeneracy will diminish statistical significance of biological signals and increasing motif size will cause combinatorial explosion. In this report, we present a novel rule-based method that is focused on finding degenerate and long motifs. Our proposed method, named iTriplet, avoids costly enumeration present in existing combinatorial methods and is amenable to parallel processing. RESULTS: We have conducted a comprehensive assessment on the performance and sensitivity-specificity of iTriplet in analyzing artificial and real biological sequences in various genomic regions. The results show that iTriplet is able to solve challenging cases. Furthermore we have confirmed the utility of iTriplet by showing it accurately predicts polyA-site-related motifs using a dual Luciferase reporter assay. CONCLUSION: iTriplet is a novel rule-based combinatorial or enumerative motif finding method that is able to process highly degenerate and long motifs that have resisted analysis by other methods. In addition, iTriplet is distinguished from other methods of the same family by its parallelizability, which allows it to leverage the power of today's readily available high-performance computing systems.
format Text
id pubmed-2784457
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27844572009-11-27 iTriplet, a rule-based nucleic acid sequence motif finder Ho, Eric S Jakubowski, Christopher D Gunderson, Samuel I Algorithms Mol Biol Research BACKGROUND: With the advent of high throughput sequencing techniques, large amounts of sequencing data are readily available for analysis. Natural biological signals are intrinsically highly variable making their complete identification a computationally challenging problem. Many attempts in using statistical or combinatorial approaches have been made with great success in the past. However, identifying highly degenerate and long (>20 nucleotides) motifs still remains an unmet challenge as high degeneracy will diminish statistical significance of biological signals and increasing motif size will cause combinatorial explosion. In this report, we present a novel rule-based method that is focused on finding degenerate and long motifs. Our proposed method, named iTriplet, avoids costly enumeration present in existing combinatorial methods and is amenable to parallel processing. RESULTS: We have conducted a comprehensive assessment on the performance and sensitivity-specificity of iTriplet in analyzing artificial and real biological sequences in various genomic regions. The results show that iTriplet is able to solve challenging cases. Furthermore we have confirmed the utility of iTriplet by showing it accurately predicts polyA-site-related motifs using a dual Luciferase reporter assay. CONCLUSION: iTriplet is a novel rule-based combinatorial or enumerative motif finding method that is able to process highly degenerate and long motifs that have resisted analysis by other methods. In addition, iTriplet is distinguished from other methods of the same family by its parallelizability, which allows it to leverage the power of today's readily available high-performance computing systems. BioMed Central 2009-10-29 /pmc/articles/PMC2784457/ /pubmed/19874606 http://dx.doi.org/10.1186/1748-7188-4-14 Text en Copyright ©2009 Ho et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Ho, Eric S
Jakubowski, Christopher D
Gunderson, Samuel I
iTriplet, a rule-based nucleic acid sequence motif finder
title iTriplet, a rule-based nucleic acid sequence motif finder
title_full iTriplet, a rule-based nucleic acid sequence motif finder
title_fullStr iTriplet, a rule-based nucleic acid sequence motif finder
title_full_unstemmed iTriplet, a rule-based nucleic acid sequence motif finder
title_short iTriplet, a rule-based nucleic acid sequence motif finder
title_sort itriplet, a rule-based nucleic acid sequence motif finder
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2784457/
https://www.ncbi.nlm.nih.gov/pubmed/19874606
http://dx.doi.org/10.1186/1748-7188-4-14
work_keys_str_mv AT hoerics itripletarulebasednucleicacidsequencemotiffinder
AT jakubowskichristopherd itripletarulebasednucleicacidsequencemotiffinder
AT gundersonsamueli itripletarulebasednucleicacidsequencemotiffinder