Cargando…
BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments
Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6119457/ https://www.ncbi.nlm.nih.gov/pubmed/30186700 http://dx.doi.org/10.7717/peerj.5551 |
_version_ | 1783352089547636736 |
---|---|
author | Mondelli, Maria Luiza Magalhães, Thiago Loss, Guilherme Wilde, Michael Foster, Ian Mattoso, Marta Katz, Daniel Barbosa, Helio de Vasconcelos, Ana Tereza R. Ocaña, Kary Gadelha, Luiz M.R. |
author_facet | Mondelli, Maria Luiza Magalhães, Thiago Loss, Guilherme Wilde, Michael Foster, Ian Mattoso, Marta Katz, Daniel Barbosa, Helio de Vasconcelos, Ana Tereza R. Ocaña, Kary Gadelha, Luiz M.R. |
author_sort | Mondelli, Maria Luiza |
collection | PubMed |
description | Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process. |
format | Online Article Text |
id | pubmed-6119457 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-61194572018-09-05 BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments Mondelli, Maria Luiza Magalhães, Thiago Loss, Guilherme Wilde, Michael Foster, Ian Mattoso, Marta Katz, Daniel Barbosa, Helio de Vasconcelos, Ana Tereza R. Ocaña, Kary Gadelha, Luiz M.R. PeerJ Bioinformatics Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process. PeerJ Inc. 2018-08-29 /pmc/articles/PMC6119457/ /pubmed/30186700 http://dx.doi.org/10.7717/peerj.5551 Text en © 2018 Mondelli et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Mondelli, Maria Luiza Magalhães, Thiago Loss, Guilherme Wilde, Michael Foster, Ian Mattoso, Marta Katz, Daniel Barbosa, Helio de Vasconcelos, Ana Tereza R. Ocaña, Kary Gadelha, Luiz M.R. BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments |
title | BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments |
title_full | BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments |
title_fullStr | BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments |
title_full_unstemmed | BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments |
title_short | BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments |
title_sort | bioworkbench: a high-performance framework for managing and analyzing bioinformatics experiments |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6119457/ https://www.ncbi.nlm.nih.gov/pubmed/30186700 http://dx.doi.org/10.7717/peerj.5551 |
work_keys_str_mv | AT mondellimarialuiza bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments AT magalhaesthiago bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments AT lossguilherme bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments AT wildemichael bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments AT fosterian bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments AT mattosomarta bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments AT katzdaniel bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments AT barbosahelio bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments AT devasconcelosanaterezar bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments AT ocanakary bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments AT gadelhaluizmr bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments |