Cargando…

eHive: An Artificial Intelligence workflow system for genomic analysis

BACKGROUND: The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations r...

Descripción completa

Detalles Bibliográficos
Autores principales: Severin, Jessica, Beal, Kathryn, Vilella, Albert J, Fitzgerald, Stephen, Schuster, Michael, Gordon, Leo, Ureta-Vidal, Abel, Flicek, Paul, Herrero, Javier
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2885371/
https://www.ncbi.nlm.nih.gov/pubmed/20459813
http://dx.doi.org/10.1186/1471-2105-11-240
_version_ 1782182380451135488
author Severin, Jessica
Beal, Kathryn
Vilella, Albert J
Fitzgerald, Stephen
Schuster, Michael
Gordon, Leo
Ureta-Vidal, Abel
Flicek, Paul
Herrero, Javier
author_facet Severin, Jessica
Beal, Kathryn
Vilella, Albert J
Fitzgerald, Stephen
Schuster, Michael
Gordon, Leo
Ureta-Vidal, Abel
Flicek, Paul
Herrero, Javier
author_sort Severin, Jessica
collection PubMed
description BACKGROUND: The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. RESULTS: We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. CONCLUSIONS: eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.
format Text
id pubmed-2885371
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28853712010-06-15 eHive: An Artificial Intelligence workflow system for genomic analysis Severin, Jessica Beal, Kathryn Vilella, Albert J Fitzgerald, Stephen Schuster, Michael Gordon, Leo Ureta-Vidal, Abel Flicek, Paul Herrero, Javier BMC Bioinformatics Methodology article BACKGROUND: The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. RESULTS: We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. CONCLUSIONS: eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/. BioMed Central 2010-05-11 /pmc/articles/PMC2885371/ /pubmed/20459813 http://dx.doi.org/10.1186/1471-2105-11-240 Text en Copyright ©2010 Severin et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology article
Severin, Jessica
Beal, Kathryn
Vilella, Albert J
Fitzgerald, Stephen
Schuster, Michael
Gordon, Leo
Ureta-Vidal, Abel
Flicek, Paul
Herrero, Javier
eHive: An Artificial Intelligence workflow system for genomic analysis
title eHive: An Artificial Intelligence workflow system for genomic analysis
title_full eHive: An Artificial Intelligence workflow system for genomic analysis
title_fullStr eHive: An Artificial Intelligence workflow system for genomic analysis
title_full_unstemmed eHive: An Artificial Intelligence workflow system for genomic analysis
title_short eHive: An Artificial Intelligence workflow system for genomic analysis
title_sort ehive: an artificial intelligence workflow system for genomic analysis
topic Methodology article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2885371/
https://www.ncbi.nlm.nih.gov/pubmed/20459813
http://dx.doi.org/10.1186/1471-2105-11-240
work_keys_str_mv AT severinjessica ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT bealkathryn ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT vilellaalbertj ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT fitzgeraldstephen ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT schustermichael ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT gordonleo ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT uretavidalabel ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT flicekpaul ehiveanartificialintelligenceworkflowsystemforgenomicanalysis
AT herrerojavier ehiveanartificialintelligenceworkflowsystemforgenomicanalysis