Cargando…

Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems

Data analysis pipelines are now established as an effective means for specifying and executing bioinformatics data analysis and experiments. While scripting languages, particularly Python, R and notebooks, are popular and sufficient for developing small-scale pipelines that are often intended for a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Djaffardjy, Marine, Marchment, George, Sebe, Clémence, Blanchet, Raphael, Bellajhame, Khalid, Gaignard, Alban, Lemoine, Frédéric, Cohen-Boulakia, Sarah
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Research Network of Computational and Structural Biotechnology 2023
Materias:	Review Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10030817/ https://www.ncbi.nlm.nih.gov/pubmed/36968012 http://dx.doi.org/10.1016/j.csbj.2023.03.003

_version_	1784910460382871552
author	Djaffardjy, Marine Marchment, George Sebe, Clémence Blanchet, Raphael Bellajhame, Khalid Gaignard, Alban Lemoine, Frédéric Cohen-Boulakia, Sarah
author_facet	Djaffardjy, Marine Marchment, George Sebe, Clémence Blanchet, Raphael Bellajhame, Khalid Gaignard, Alban Lemoine, Frédéric Cohen-Boulakia, Sarah
author_sort	Djaffardjy, Marine
collection	PubMed
description	Data analysis pipelines are now established as an effective means for specifying and executing bioinformatics data analysis and experiments. While scripting languages, particularly Python, R and notebooks, are popular and sufficient for developing small-scale pipelines that are often intended for a single user, it is now widely recognized that they are by no means enough to support the development of large-scale, shareable, maintainable and reusable pipelines capable of handling large volumes of data and running on high performance computing clusters. This review outlines the key requirements for building large-scale data pipelines and provides a mapping of existing solutions that fulfill them. We then highlight the benefits of using scientific workflow systems to get modular, reproducible and reusable bioinformatics data analysis pipelines. We finally discuss current workflow reuse practices based on an empirical study we performed on a large collection of workflows.
format	Online Article Text
id	pubmed-10030817
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Research Network of Computational and Structural Biotechnology
record_format	MEDLINE/PubMed
spelling	pubmed-100308172023-03-23 Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems Djaffardjy, Marine Marchment, George Sebe, Clémence Blanchet, Raphael Bellajhame, Khalid Gaignard, Alban Lemoine, Frédéric Cohen-Boulakia, Sarah Comput Struct Biotechnol J Review Article Data analysis pipelines are now established as an effective means for specifying and executing bioinformatics data analysis and experiments. While scripting languages, particularly Python, R and notebooks, are popular and sufficient for developing small-scale pipelines that are often intended for a single user, it is now widely recognized that they are by no means enough to support the development of large-scale, shareable, maintainable and reusable pipelines capable of handling large volumes of data and running on high performance computing clusters. This review outlines the key requirements for building large-scale data pipelines and provides a mapping of existing solutions that fulfill them. We then highlight the benefits of using scientific workflow systems to get modular, reproducible and reusable bioinformatics data analysis pipelines. We finally discuss current workflow reuse practices based on an empirical study we performed on a large collection of workflows. Research Network of Computational and Structural Biotechnology 2023-03-07 /pmc/articles/PMC10030817/ /pubmed/36968012 http://dx.doi.org/10.1016/j.csbj.2023.03.003 Text en © 2023 Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Review Article Djaffardjy, Marine Marchment, George Sebe, Clémence Blanchet, Raphael Bellajhame, Khalid Gaignard, Alban Lemoine, Frédéric Cohen-Boulakia, Sarah Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems
title	Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems
title_full	Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems
title_fullStr	Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems
title_full_unstemmed	Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems
title_short	Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems
title_sort	developing and reusing bioinformatics data analysis pipelines using scientific workflow systems
topic	Review Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10030817/ https://www.ncbi.nlm.nih.gov/pubmed/36968012 http://dx.doi.org/10.1016/j.csbj.2023.03.003
work_keys_str_mv	AT djaffardjymarine developingandreusingbioinformaticsdataanalysispipelinesusingscientificworkflowsystems AT marchmentgeorge developingandreusingbioinformaticsdataanalysispipelinesusingscientificworkflowsystems AT sebeclemence developingandreusingbioinformaticsdataanalysispipelinesusingscientificworkflowsystems AT blanchetraphael developingandreusingbioinformaticsdataanalysispipelinesusingscientificworkflowsystems AT bellajhamekhalid developingandreusingbioinformaticsdataanalysispipelinesusingscientificworkflowsystems AT gaignardalban developingandreusingbioinformaticsdataanalysispipelinesusingscientificworkflowsystems AT lemoinefrederic developingandreusingbioinformaticsdataanalysispipelinesusingscientificworkflowsystems AT cohenboulakiasarah developingandreusingbioinformaticsdataanalysispipelinesusingscientificworkflowsystems

Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems

Ejemplares similares