Cargando…

Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics

BACKGROUND: The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wel...

Descripción completa

Detalles Bibliográficos
Autores principales: Baichoo, Shakuntala, Souilmi, Yassine, Panji, Sumir, Botha, Gerrit, Meintjes, Ayton, Hazelhurst, Scott, Bendou, Hocine, Beste, Eugene de, Mpangase, Phelelani T., Souiai, Oussema, Alghali, Mustafa, Yi, Long, O’Connor, Brian D., Crusoe, Michael, Armstrong, Don, Aron, Shaun, Joubert, Fourie, Ahmed, Azza E., Mbiyavanga, Mamana, Heusden, Peter van, Magosi, Lerato E., Zermeno, Jennie, Mainzer, Liudmila Sergeevna, Fadlelmola, Faisal M., Jongeneel, C. Victor, Mulder, Nicola
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6264621/
https://www.ncbi.nlm.nih.gov/pubmed/30486782
http://dx.doi.org/10.1186/s12859-018-2446-1
_version_ 1783375539307806720
author Baichoo, Shakuntala
Souilmi, Yassine
Panji, Sumir
Botha, Gerrit
Meintjes, Ayton
Hazelhurst, Scott
Bendou, Hocine
Beste, Eugene de
Mpangase, Phelelani T.
Souiai, Oussema
Alghali, Mustafa
Yi, Long
O’Connor, Brian D.
Crusoe, Michael
Armstrong, Don
Aron, Shaun
Joubert, Fourie
Ahmed, Azza E.
Mbiyavanga, Mamana
Heusden, Peter van
Magosi, Lerato E.
Zermeno, Jennie
Mainzer, Liudmila Sergeevna
Fadlelmola, Faisal M.
Jongeneel, C. Victor
Mulder, Nicola
author_facet Baichoo, Shakuntala
Souilmi, Yassine
Panji, Sumir
Botha, Gerrit
Meintjes, Ayton
Hazelhurst, Scott
Bendou, Hocine
Beste, Eugene de
Mpangase, Phelelani T.
Souiai, Oussema
Alghali, Mustafa
Yi, Long
O’Connor, Brian D.
Crusoe, Michael
Armstrong, Don
Aron, Shaun
Joubert, Fourie
Ahmed, Azza E.
Mbiyavanga, Mamana
Heusden, Peter van
Magosi, Lerato E.
Zermeno, Jennie
Mainzer, Liudmila Sergeevna
Fadlelmola, Faisal M.
Jongeneel, C. Victor
Mulder, Nicola
author_sort Baichoo, Shakuntala
collection PubMed
description BACKGROUND: The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging. RESULTS: H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community. CONCLUSION: The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network.
format Online
Article
Text
id pubmed-6264621
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62646212018-12-05 Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics Baichoo, Shakuntala Souilmi, Yassine Panji, Sumir Botha, Gerrit Meintjes, Ayton Hazelhurst, Scott Bendou, Hocine Beste, Eugene de Mpangase, Phelelani T. Souiai, Oussema Alghali, Mustafa Yi, Long O’Connor, Brian D. Crusoe, Michael Armstrong, Don Aron, Shaun Joubert, Fourie Ahmed, Azza E. Mbiyavanga, Mamana Heusden, Peter van Magosi, Lerato E. Zermeno, Jennie Mainzer, Liudmila Sergeevna Fadlelmola, Faisal M. Jongeneel, C. Victor Mulder, Nicola BMC Bioinformatics Software BACKGROUND: The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging. RESULTS: H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community. CONCLUSION: The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network. BioMed Central 2018-11-29 /pmc/articles/PMC6264621/ /pubmed/30486782 http://dx.doi.org/10.1186/s12859-018-2446-1 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Baichoo, Shakuntala
Souilmi, Yassine
Panji, Sumir
Botha, Gerrit
Meintjes, Ayton
Hazelhurst, Scott
Bendou, Hocine
Beste, Eugene de
Mpangase, Phelelani T.
Souiai, Oussema
Alghali, Mustafa
Yi, Long
O’Connor, Brian D.
Crusoe, Michael
Armstrong, Don
Aron, Shaun
Joubert, Fourie
Ahmed, Azza E.
Mbiyavanga, Mamana
Heusden, Peter van
Magosi, Lerato E.
Zermeno, Jennie
Mainzer, Liudmila Sergeevna
Fadlelmola, Faisal M.
Jongeneel, C. Victor
Mulder, Nicola
Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics
title Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics
title_full Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics
title_fullStr Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics
title_full_unstemmed Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics
title_short Developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support African genomics
title_sort developing reproducible bioinformatics analysis workflows for heterogeneous computing environments to support african genomics
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6264621/
https://www.ncbi.nlm.nih.gov/pubmed/30486782
http://dx.doi.org/10.1186/s12859-018-2446-1
work_keys_str_mv AT baichooshakuntala developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT souilmiyassine developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT panjisumir developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT bothagerrit developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT meintjesayton developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT hazelhurstscott developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT bendouhocine developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT besteeugenede developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT mpangasephelelanit developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT souiaioussema developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT alghalimustafa developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT yilong developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT oconnorbriand developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT crusoemichael developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT armstrongdon developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT aronshaun developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT joubertfourie developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT ahmedazzae developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT mbiyavangamamana developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT heusdenpetervan developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT magosileratoe developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT zermenojennie developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT mainzerliudmilasergeevna developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT fadlelmolafaisalm developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT jongeneelcvictor developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics
AT muldernicola developingreproduciblebioinformaticsanalysisworkflowsforheterogeneouscomputingenvironmentstosupportafricangenomics