Cargando…
Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data
The availability of high-throughput molecular profiling techniques has provided more accurate and informative data for regular clinical studies. Nevertheless, complex computational workflows are required to interpret these data. Over the past years, the data volume has been growing explosively, requ...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
F1000Research
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5657030/ https://www.ncbi.nlm.nih.gov/pubmed/29123641 http://dx.doi.org/10.12688/f1000research.12168.1 |
_version_ | 1783273806066876416 |
---|---|
author | Zhang, Chao Bijlard, Jochem Staiger, Christine Scollen, Serena van Enckevort, David Hoogstrate, Youri Senf, Alexander Hiltemann, Saskia Repo, Susanna Pipping, Wibo Bierkens, Mariska Payralbe, Stefan Stringer, Bas Heringa, Jaap Stubbs, Andrew Bonino Da Silva Santos, Luiz Olavo Belien, Jeroen Weistra, Ward Azevedo, Rita van Bochove, Kees Meijer, Gerrit Boiten, Jan-Willem Rambla, Jordi Fijneman, Remond Spalding, J. Dylan Abeln, Sanne |
author_facet | Zhang, Chao Bijlard, Jochem Staiger, Christine Scollen, Serena van Enckevort, David Hoogstrate, Youri Senf, Alexander Hiltemann, Saskia Repo, Susanna Pipping, Wibo Bierkens, Mariska Payralbe, Stefan Stringer, Bas Heringa, Jaap Stubbs, Andrew Bonino Da Silva Santos, Luiz Olavo Belien, Jeroen Weistra, Ward Azevedo, Rita van Bochove, Kees Meijer, Gerrit Boiten, Jan-Willem Rambla, Jordi Fijneman, Remond Spalding, J. Dylan Abeln, Sanne |
author_sort | Zhang, Chao |
collection | PubMed |
description | The availability of high-throughput molecular profiling techniques has provided more accurate and informative data for regular clinical studies. Nevertheless, complex computational workflows are required to interpret these data. Over the past years, the data volume has been growing explosively, requiring robust human data management to organise and integrate the data efficiently. For this reason, we set up an ELIXIR implementation study, together with the Translational research IT (TraIT) programme, to design a data ecosystem that is able to link raw and interpreted data. In this project, the data from the TraIT Cell Line Use Case (TraIT-CLUC) are used as a test case for this system. Within this ecosystem, we use the European Genome-phenome Archive (EGA) to store raw molecular profiling data; tranSMART to collect interpreted molecular profiling data and clinical data for corresponding samples; and Galaxy to store, run and manage the computational workflows. We can integrate these data by linking their repositories systematically. To showcase our design, we have structured the TraIT-CLUC data, which contain a variety of molecular profiling data types, for storage in both tranSMART and EGA. The metadata provided allows referencing between tranSMART and EGA, fulfilling the cycle of data submission and discovery; we have also designed a data flow from EGA to Galaxy, enabling reanalysis of the raw data in Galaxy. In this way, users can select patient cohorts in tranSMART, trace them back to the raw data and perform (re)analysis in Galaxy. Our conclusion is that the majority of metadata does not necessarily need to be stored (redundantly) in both databases, but that instead FAIR persistent identifiers should be available for well-defined data ontology levels: study, data access committee, physical sample, data sample and raw data file. This approach will pave the way for the stable linkage and reuse of data. |
format | Online Article Text |
id | pubmed-5657030 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | F1000Research |
record_format | MEDLINE/PubMed |
spelling | pubmed-56570302017-11-08 Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data Zhang, Chao Bijlard, Jochem Staiger, Christine Scollen, Serena van Enckevort, David Hoogstrate, Youri Senf, Alexander Hiltemann, Saskia Repo, Susanna Pipping, Wibo Bierkens, Mariska Payralbe, Stefan Stringer, Bas Heringa, Jaap Stubbs, Andrew Bonino Da Silva Santos, Luiz Olavo Belien, Jeroen Weistra, Ward Azevedo, Rita van Bochove, Kees Meijer, Gerrit Boiten, Jan-Willem Rambla, Jordi Fijneman, Remond Spalding, J. Dylan Abeln, Sanne F1000Res Method Article The availability of high-throughput molecular profiling techniques has provided more accurate and informative data for regular clinical studies. Nevertheless, complex computational workflows are required to interpret these data. Over the past years, the data volume has been growing explosively, requiring robust human data management to organise and integrate the data efficiently. For this reason, we set up an ELIXIR implementation study, together with the Translational research IT (TraIT) programme, to design a data ecosystem that is able to link raw and interpreted data. In this project, the data from the TraIT Cell Line Use Case (TraIT-CLUC) are used as a test case for this system. Within this ecosystem, we use the European Genome-phenome Archive (EGA) to store raw molecular profiling data; tranSMART to collect interpreted molecular profiling data and clinical data for corresponding samples; and Galaxy to store, run and manage the computational workflows. We can integrate these data by linking their repositories systematically. To showcase our design, we have structured the TraIT-CLUC data, which contain a variety of molecular profiling data types, for storage in both tranSMART and EGA. The metadata provided allows referencing between tranSMART and EGA, fulfilling the cycle of data submission and discovery; we have also designed a data flow from EGA to Galaxy, enabling reanalysis of the raw data in Galaxy. In this way, users can select patient cohorts in tranSMART, trace them back to the raw data and perform (re)analysis in Galaxy. Our conclusion is that the majority of metadata does not necessarily need to be stored (redundantly) in both databases, but that instead FAIR persistent identifiers should be available for well-defined data ontology levels: study, data access committee, physical sample, data sample and raw data file. This approach will pave the way for the stable linkage and reuse of data. F1000Research 2017-08-16 /pmc/articles/PMC5657030/ /pubmed/29123641 http://dx.doi.org/10.12688/f1000research.12168.1 Text en Copyright: © 2017 Zhang C et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Method Article Zhang, Chao Bijlard, Jochem Staiger, Christine Scollen, Serena van Enckevort, David Hoogstrate, Youri Senf, Alexander Hiltemann, Saskia Repo, Susanna Pipping, Wibo Bierkens, Mariska Payralbe, Stefan Stringer, Bas Heringa, Jaap Stubbs, Andrew Bonino Da Silva Santos, Luiz Olavo Belien, Jeroen Weistra, Ward Azevedo, Rita van Bochove, Kees Meijer, Gerrit Boiten, Jan-Willem Rambla, Jordi Fijneman, Remond Spalding, J. Dylan Abeln, Sanne Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data |
title | Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data |
title_full | Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data |
title_fullStr | Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data |
title_full_unstemmed | Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data |
title_short | Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data |
title_sort | systematically linking transmart, galaxy and ega for reusing human translational research data |
topic | Method Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5657030/ https://www.ncbi.nlm.nih.gov/pubmed/29123641 http://dx.doi.org/10.12688/f1000research.12168.1 |
work_keys_str_mv | AT zhangchao systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT bijlardjochem systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT staigerchristine systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT scollenserena systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT vanenckevortdavid systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT hoogstrateyouri systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT senfalexander systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT hiltemannsaskia systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT reposusanna systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT pippingwibo systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT bierkensmariska systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT payralbestefan systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT stringerbas systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT heringajaap systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT stubbsandrew systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT boninodasilvasantosluizolavo systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT belienjeroen systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT weistraward systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT azevedorita systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT vanbochovekees systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT meijergerrit systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT boitenjanwillem systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT ramblajordi systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT fijnemanremond systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT spaldingjdylan systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata AT abelnsanne systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata |