Cargando…

BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized...

Descripción completa

Detalles Bibliográficos
Autores principales: Mondelli, Maria Luiza, Magalhães, Thiago, Loss, Guilherme, Wilde, Michael, Foster, Ian, Mattoso, Marta, Katz, Daniel, Barbosa, Helio, de Vasconcelos, Ana Tereza R., Ocaña, Kary, Gadelha, Luiz M.R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6119457/
https://www.ncbi.nlm.nih.gov/pubmed/30186700
http://dx.doi.org/10.7717/peerj.5551
_version_ 1783352089547636736
author Mondelli, Maria Luiza
Magalhães, Thiago
Loss, Guilherme
Wilde, Michael
Foster, Ian
Mattoso, Marta
Katz, Daniel
Barbosa, Helio
de Vasconcelos, Ana Tereza R.
Ocaña, Kary
Gadelha, Luiz M.R.
author_facet Mondelli, Maria Luiza
Magalhães, Thiago
Loss, Guilherme
Wilde, Michael
Foster, Ian
Mattoso, Marta
Katz, Daniel
Barbosa, Helio
de Vasconcelos, Ana Tereza R.
Ocaña, Kary
Gadelha, Luiz M.R.
author_sort Mondelli, Maria Luiza
collection PubMed
description Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process.
format Online
Article
Text
id pubmed-6119457
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-61194572018-09-05 BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments Mondelli, Maria Luiza Magalhães, Thiago Loss, Guilherme Wilde, Michael Foster, Ian Mattoso, Marta Katz, Daniel Barbosa, Helio de Vasconcelos, Ana Tereza R. Ocaña, Kary Gadelha, Luiz M.R. PeerJ Bioinformatics Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process. PeerJ Inc. 2018-08-29 /pmc/articles/PMC6119457/ /pubmed/30186700 http://dx.doi.org/10.7717/peerj.5551 Text en © 2018 Mondelli et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Mondelli, Maria Luiza
Magalhães, Thiago
Loss, Guilherme
Wilde, Michael
Foster, Ian
Mattoso, Marta
Katz, Daniel
Barbosa, Helio
de Vasconcelos, Ana Tereza R.
Ocaña, Kary
Gadelha, Luiz M.R.
BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments
title BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments
title_full BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments
title_fullStr BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments
title_full_unstemmed BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments
title_short BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments
title_sort bioworkbench: a high-performance framework for managing and analyzing bioinformatics experiments
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6119457/
https://www.ncbi.nlm.nih.gov/pubmed/30186700
http://dx.doi.org/10.7717/peerj.5551
work_keys_str_mv AT mondellimarialuiza bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT magalhaesthiago bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT lossguilherme bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT wildemichael bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT fosterian bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT mattosomarta bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT katzdaniel bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT barbosahelio bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT devasconcelosanaterezar bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT ocanakary bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments
AT gadelhaluizmr bioworkbenchahighperformanceframeworkformanagingandanalyzingbioinformaticsexperiments