Cargando…
The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows
As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platf...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
F1000Research
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5333608/ https://www.ncbi.nlm.nih.gov/pubmed/28344774 http://dx.doi.org/10.12688/f1000research.10137.1 |
_version_ | 1782511742615552000 |
---|---|
author | O'Connor, Brian D. Yuen, Denis Chung, Vincent Duncan, Andrew G. Liu, Xiang Kun Patricia, Janice Paten, Benedict Stein, Lincoln Ferretti, Vincent |
author_facet | O'Connor, Brian D. Yuen, Denis Chung, Vincent Duncan, Andrew G. Liu, Xiang Kun Patricia, Janice Paten, Benedict Stein, Lincoln Ferretti, Vincent |
author_sort | O'Connor, Brian D. |
collection | PubMed |
description | As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore ( https://dockstore.org), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH). |
format | Online Article Text |
id | pubmed-5333608 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | F1000Research |
record_format | MEDLINE/PubMed |
spelling | pubmed-53336082017-03-23 The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows O'Connor, Brian D. Yuen, Denis Chung, Vincent Duncan, Andrew G. Liu, Xiang Kun Patricia, Janice Paten, Benedict Stein, Lincoln Ferretti, Vincent F1000Res Software Tool Article As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore ( https://dockstore.org), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH). F1000Research 2017-01-18 /pmc/articles/PMC5333608/ /pubmed/28344774 http://dx.doi.org/10.12688/f1000research.10137.1 Text en Copyright: © 2017 O'Connor BD et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Tool Article O'Connor, Brian D. Yuen, Denis Chung, Vincent Duncan, Andrew G. Liu, Xiang Kun Patricia, Janice Paten, Benedict Stein, Lincoln Ferretti, Vincent The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows |
title | The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows |
title_full | The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows |
title_fullStr | The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows |
title_full_unstemmed | The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows |
title_short | The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows |
title_sort | dockstore: enabling modular, community-focused sharing of docker-based genomics tools and workflows |
topic | Software Tool Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5333608/ https://www.ncbi.nlm.nih.gov/pubmed/28344774 http://dx.doi.org/10.12688/f1000research.10137.1 |
work_keys_str_mv | AT oconnorbriand thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT yuendenis thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT chungvincent thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT duncanandrewg thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT liuxiangkun thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT patriciajanice thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT patenbenedict thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT steinlincoln thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT ferrettivincent thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT oconnorbriand dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT yuendenis dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT chungvincent dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT duncanandrewg dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT liuxiangkun dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT patriciajanice dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT patenbenedict dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT steinlincoln dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows AT ferrettivincent dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflows |