Cargando…

The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows

As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platf...

Descripción completa

Detalles Bibliográficos
Autores principales: O'Connor, Brian D., Yuen, Denis, Chung, Vincent, Duncan, Andrew G., Liu, Xiang Kun, Patricia, Janice, Paten, Benedict, Stein, Lincoln, Ferretti, Vincent
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000Research 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5333608/
https://www.ncbi.nlm.nih.gov/pubmed/28344774
http://dx.doi.org/10.12688/f1000research.10137.1
_version_ 1782511742615552000
author O'Connor, Brian D.
Yuen, Denis
Chung, Vincent
Duncan, Andrew G.
Liu, Xiang Kun
Patricia, Janice
Paten, Benedict
Stein, Lincoln
Ferretti, Vincent
author_facet O'Connor, Brian D.
Yuen, Denis
Chung, Vincent
Duncan, Andrew G.
Liu, Xiang Kun
Patricia, Janice
Paten, Benedict
Stein, Lincoln
Ferretti, Vincent
author_sort O'Connor, Brian D.
collection PubMed
description As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore ( https://dockstore.org), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH).
format Online
Article
Text
id pubmed-5333608
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher F1000Research
record_format MEDLINE/PubMed
spelling pubmed-53336082017-03-23 The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows O'Connor, Brian D. Yuen, Denis Chung, Vincent Duncan, Andrew G. Liu, Xiang Kun Patricia, Janice Paten, Benedict Stein, Lincoln Ferretti, Vincent F1000Res Software Tool Article As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore ( https://dockstore.org), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH). F1000Research 2017-01-18 /pmc/articles/PMC5333608/ /pubmed/28344774 http://dx.doi.org/10.12688/f1000research.10137.1 Text en Copyright: © 2017 O'Connor BD et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Tool Article
O'Connor, Brian D.
Yuen, Denis
Chung, Vincent
Duncan, Andrew G.
Liu, Xiang Kun
Patricia, Janice
Paten, Benedict
Stein, Lincoln
Ferretti, Vincent
The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows
title The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows
title_full The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows
title_fullStr The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows
title_full_unstemmed The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows
title_short The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows
title_sort dockstore: enabling modular, community-focused sharing of docker-based genomics tools and workflows
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5333608/
https://www.ncbi.nlm.nih.gov/pubmed/28344774
http://dx.doi.org/10.12688/f1000research.10137.1
work_keys_str_mv AT oconnorbriand thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT yuendenis thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT chungvincent thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT duncanandrewg thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT liuxiangkun thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT patriciajanice thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT patenbenedict thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT steinlincoln thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT ferrettivincent thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT oconnorbriand dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT yuendenis dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT chungvincent dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT duncanandrewg dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT liuxiangkun dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT patriciajanice dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT patenbenedict dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT steinlincoln dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows
AT ferrettivincent dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows