Cargando…

Kangaroo – A pattern-matching program for biological sequences

BACKGROUND: Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription...

Descripción completa

Detalles Bibliográficos
Autores principales: Betel, Doron, Hogue, Christopher WV
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2002
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC119856/
https://www.ncbi.nlm.nih.gov/pubmed/12150718
http://dx.doi.org/10.1186/1471-2105-3-20
_version_ 1782120297442312192
author Betel, Doron
Hogue, Christopher WV
author_facet Betel, Doron
Hogue, Christopher WV
author_sort Betel, Doron
collection PubMed
description BACKGROUND: Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. RESULTS: Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. CONCLUSION: A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats.
format Text
id pubmed-119856
institution National Center for Biotechnology Information
language English
publishDate 2002
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-1198562002-09-04 Kangaroo – A pattern-matching program for biological sequences Betel, Doron Hogue, Christopher WV BMC Bioinformatics Methodology article BACKGROUND: Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. RESULTS: Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. CONCLUSION: A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats. BioMed Central 2002-07-31 /pmc/articles/PMC119856/ /pubmed/12150718 http://dx.doi.org/10.1186/1471-2105-3-20 Text en Copyright ©2002 Betel and Hogue; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Methodology article
Betel, Doron
Hogue, Christopher WV
Kangaroo – A pattern-matching program for biological sequences
title Kangaroo – A pattern-matching program for biological sequences
title_full Kangaroo – A pattern-matching program for biological sequences
title_fullStr Kangaroo – A pattern-matching program for biological sequences
title_full_unstemmed Kangaroo – A pattern-matching program for biological sequences
title_short Kangaroo – A pattern-matching program for biological sequences
title_sort kangaroo – a pattern-matching program for biological sequences
topic Methodology article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC119856/
https://www.ncbi.nlm.nih.gov/pubmed/12150718
http://dx.doi.org/10.1186/1471-2105-3-20
work_keys_str_mv AT beteldoron kangarooapatternmatchingprogramforbiologicalsequences
AT hoguechristopherwv kangarooapatternmatchingprogramforbiologicalsequences