Cargando…

BigDataScript: a scripting language for data pipelines

Motivation: The analysis of large biological datasets often requires complex processing pipelines that run for a long time on large computational infrastructures. We designed and implemented a simple script-like programming language with a clean and minimalist syntax to develop and manage pipeline e...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cingolani, Pablo, Sladek, Rob, Blanchette, Mathieu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2015
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271142/ https://www.ncbi.nlm.nih.gov/pubmed/25189778 http://dx.doi.org/10.1093/bioinformatics/btu595

_version_	1782349558043377664
author	Cingolani, Pablo Sladek, Rob Blanchette, Mathieu
author_facet	Cingolani, Pablo Sladek, Rob Blanchette, Mathieu
author_sort	Cingolani, Pablo
collection	PubMed
description	Motivation: The analysis of large biological datasets often requires complex processing pipelines that run for a long time on large computational infrastructures. We designed and implemented a simple script-like programming language with a clean and minimalist syntax to develop and manage pipeline execution and provide robustness to various types of software and hardware failures as well as portability. Results: We introduce the BigDataScript (BDS) programming language for data processing pipelines, which improves abstraction from hardware resources and assists with robustness. Hardware abstraction allows BDS pipelines to run without modification on a wide range of computer architectures, from a small laptop to multi-core servers, server farms, clusters and clouds. BDS achieves robustness by incorporating the concepts of absolute serialization and lazy processing, thus allowing pipelines to recover from errors. By abstracting pipeline concepts at programming language level, BDS simplifies implementation, execution and management of complex bioinformatics pipelines, resulting in reduced development and debugging cycles as well as cleaner code. Availability and implementation: BigDataScript is available under open-source license at http://pcingola.github.io/BigDataScript. Contact: pablo.e.cingolani@gmail.com
format	Online Article Text
id	pubmed-4271142
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-42711422015-01-08 BigDataScript: a scripting language for data pipelines Cingolani, Pablo Sladek, Rob Blanchette, Mathieu Bioinformatics Original Papers Motivation: The analysis of large biological datasets often requires complex processing pipelines that run for a long time on large computational infrastructures. We designed and implemented a simple script-like programming language with a clean and minimalist syntax to develop and manage pipeline execution and provide robustness to various types of software and hardware failures as well as portability. Results: We introduce the BigDataScript (BDS) programming language for data processing pipelines, which improves abstraction from hardware resources and assists with robustness. Hardware abstraction allows BDS pipelines to run without modification on a wide range of computer architectures, from a small laptop to multi-core servers, server farms, clusters and clouds. BDS achieves robustness by incorporating the concepts of absolute serialization and lazy processing, thus allowing pipelines to recover from errors. By abstracting pipeline concepts at programming language level, BDS simplifies implementation, execution and management of complex bioinformatics pipelines, resulting in reduced development and debugging cycles as well as cleaner code. Availability and implementation: BigDataScript is available under open-source license at http://pcingola.github.io/BigDataScript. Contact: pablo.e.cingolani@gmail.com Oxford University Press 2015-01-01 2014-09-03 /pmc/articles/PMC4271142/ /pubmed/25189778 http://dx.doi.org/10.1093/bioinformatics/btu595 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Cingolani, Pablo Sladek, Rob Blanchette, Mathieu BigDataScript: a scripting language for data pipelines
title	BigDataScript: a scripting language for data pipelines
title_full	BigDataScript: a scripting language for data pipelines
title_fullStr	BigDataScript: a scripting language for data pipelines
title_full_unstemmed	BigDataScript: a scripting language for data pipelines
title_short	BigDataScript: a scripting language for data pipelines
title_sort	bigdatascript: a scripting language for data pipelines
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4271142/ https://www.ncbi.nlm.nih.gov/pubmed/25189778 http://dx.doi.org/10.1093/bioinformatics/btu595
work_keys_str_mv	AT cingolanipablo bigdatascriptascriptinglanguagefordatapipelines AT sladekrob bigdatascriptascriptinglanguagefordatapipelines AT blanchettemathieu bigdatascriptascriptinglanguagefordatapipelines

BigDataScript: a scripting language for data pipelines

Ejemplares similares