Cargando…

A basic analysis toolkit for biological sequences

This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filte...

Descripción completa

Detalles Bibliográficos
Autores principales: Giancarlo, Raffaele, Siragusa, Alessandro, Siragusa, Enrico, Utro, Filippo
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2147010/
https://www.ncbi.nlm.nih.gov/pubmed/17877802
http://dx.doi.org/10.1186/1748-7188-2-10
_version_ 1782144342856564736
author Giancarlo, Raffaele
Siragusa, Alessandro
Siragusa, Enrico
Utro, Filippo
author_facet Giancarlo, Raffaele
Siragusa, Alessandro
Siragusa, Enrico
Utro, Filippo
author_sort Giancarlo, Raffaele
collection PubMed
description This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at under the GNU GPL.
format Text
id pubmed-2147010
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-21470102007-12-19 A basic analysis toolkit for biological sequences Giancarlo, Raffaele Siragusa, Alessandro Siragusa, Enrico Utro, Filippo Algorithms Mol Biol Software Article This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at under the GNU GPL. BioMed Central 2007-09-18 /pmc/articles/PMC2147010/ /pubmed/17877802 http://dx.doi.org/10.1186/1748-7188-2-10 Text en Copyright © 2007 Giancarlo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Article
Giancarlo, Raffaele
Siragusa, Alessandro
Siragusa, Enrico
Utro, Filippo
A basic analysis toolkit for biological sequences
title A basic analysis toolkit for biological sequences
title_full A basic analysis toolkit for biological sequences
title_fullStr A basic analysis toolkit for biological sequences
title_full_unstemmed A basic analysis toolkit for biological sequences
title_short A basic analysis toolkit for biological sequences
title_sort basic analysis toolkit for biological sequences
topic Software Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2147010/
https://www.ncbi.nlm.nih.gov/pubmed/17877802
http://dx.doi.org/10.1186/1748-7188-2-10
work_keys_str_mv AT giancarloraffaele abasicanalysistoolkitforbiologicalsequences
AT siragusaalessandro abasicanalysistoolkitforbiologicalsequences
AT siragusaenrico abasicanalysistoolkitforbiologicalsequences
AT utrofilippo abasicanalysistoolkitforbiologicalsequences
AT giancarloraffaele basicanalysistoolkitforbiologicalsequences
AT siragusaalessandro basicanalysistoolkitforbiologicalsequences
AT siragusaenrico basicanalysistoolkitforbiologicalsequences
AT utrofilippo basicanalysistoolkitforbiologicalsequences