Cargando…

Pegasys: software for executing and integrating analyses of biological sequences

BACKGROUND: We present Pegasys – a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools. RESULTS: The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initi...

Descripción completa

Detalles Bibliográficos
Autores principales: Shah, Sohrab P, He, David YM, Sawkins, Jessica N, Druce, Jeffrey C, Quon, Gerald, Lett, Drew, Zheng, Grace XY, Xu, Tao, Ouellette, BF Francis
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC406494/
https://www.ncbi.nlm.nih.gov/pubmed/15096276
http://dx.doi.org/10.1186/1471-2105-5-40
_version_ 1782121387893194752
author Shah, Sohrab P
He, David YM
Sawkins, Jessica N
Druce, Jeffrey C
Quon, Gerald
Lett, Drew
Zheng, Grace XY
Xu, Tao
Ouellette, BF Francis
author_facet Shah, Sohrab P
He, David YM
Sawkins, Jessica N
Druce, Jeffrey C
Quon, Gerald
Lett, Drew
Zheng, Grace XY
Xu, Tao
Ouellette, BF Francis
author_sort Shah, Sohrab P
collection PubMed
description BACKGROUND: We present Pegasys – a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools. RESULTS: The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. All non-serial dependent analyses are executed in parallel on a compute cluster for efficiency of data generation. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The modularity of the design allows for new tools to be added to the system with little programmer overhead. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries. CONCLUSIONS: The Pegasys system enables biologists and bioinformaticians to create and manage sequence analysis workflows. The software is released under the Open Source GNU General Public License. All source code and documentation is available for download at .
format Text
id pubmed-406494
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-4064942004-05-13 Pegasys: software for executing and integrating analyses of biological sequences Shah, Sohrab P He, David YM Sawkins, Jessica N Druce, Jeffrey C Quon, Gerald Lett, Drew Zheng, Grace XY Xu, Tao Ouellette, BF Francis BMC Bioinformatics Software BACKGROUND: We present Pegasys – a flexible, modular and customizable software system that facilitates the execution and data integration from heterogeneous biological sequence analysis tools. RESULTS: The Pegasys system includes numerous tools for pair-wise and multiple sequence alignment, ab initio gene prediction, RNA gene detection, masking repetitive sequences in genomic DNA as well as filters for database formatting and processing raw output from various analysis tools. We introduce a novel data structure for creating workflows of sequence analyses and a unified data model to store its results. The software allows users to dynamically create analysis workflows at run-time by manipulating a graphical user interface. All non-serial dependent analyses are executed in parallel on a compute cluster for efficiency of data generation. The uniform data model and backend relational database management system of Pegasys allow for results of heterogeneous programs included in the workflow to be integrated and exported into General Feature Format for further analyses in GFF-dependent tools, or GAME XML for import into the Apollo genome editor. The modularity of the design allows for new tools to be added to the system with little programmer overhead. The database application programming interface allows programmatic access to the data stored in the backend through SQL queries. CONCLUSIONS: The Pegasys system enables biologists and bioinformaticians to create and manage sequence analysis workflows. The software is released under the Open Source GNU General Public License. All source code and documentation is available for download at . BioMed Central 2004-04-19 /pmc/articles/PMC406494/ /pubmed/15096276 http://dx.doi.org/10.1186/1471-2105-5-40 Text en Copyright © 2004 Shah et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Software
Shah, Sohrab P
He, David YM
Sawkins, Jessica N
Druce, Jeffrey C
Quon, Gerald
Lett, Drew
Zheng, Grace XY
Xu, Tao
Ouellette, BF Francis
Pegasys: software for executing and integrating analyses of biological sequences
title Pegasys: software for executing and integrating analyses of biological sequences
title_full Pegasys: software for executing and integrating analyses of biological sequences
title_fullStr Pegasys: software for executing and integrating analyses of biological sequences
title_full_unstemmed Pegasys: software for executing and integrating analyses of biological sequences
title_short Pegasys: software for executing and integrating analyses of biological sequences
title_sort pegasys: software for executing and integrating analyses of biological sequences
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC406494/
https://www.ncbi.nlm.nih.gov/pubmed/15096276
http://dx.doi.org/10.1186/1471-2105-5-40
work_keys_str_mv AT shahsohrabp pegasyssoftwareforexecutingandintegratinganalysesofbiologicalsequences
AT hedavidym pegasyssoftwareforexecutingandintegratinganalysesofbiologicalsequences
AT sawkinsjessican pegasyssoftwareforexecutingandintegratinganalysesofbiologicalsequences
AT drucejeffreyc pegasyssoftwareforexecutingandintegratinganalysesofbiologicalsequences
AT quongerald pegasyssoftwareforexecutingandintegratinganalysesofbiologicalsequences
AT lettdrew pegasyssoftwareforexecutingandintegratinganalysesofbiologicalsequences
AT zhenggracexy pegasyssoftwareforexecutingandintegratinganalysesofbiologicalsequences
AT xutao pegasyssoftwareforexecutingandintegratinganalysesofbiologicalsequences
AT ouellettebffrancis pegasyssoftwareforexecutingandintegratinganalysesofbiologicalsequences