Cargando…

Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data

The availability of high-throughput molecular profiling techniques has provided more accurate and informative data for regular clinical studies. Nevertheless, complex computational workflows are required to interpret these data. Over the past years, the data volume has been growing explosively, requ...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Chao, Bijlard, Jochem, Staiger, Christine, Scollen, Serena, van Enckevort, David, Hoogstrate, Youri, Senf, Alexander, Hiltemann, Saskia, Repo, Susanna, Pipping, Wibo, Bierkens, Mariska, Payralbe, Stefan, Stringer, Bas, Heringa, Jaap, Stubbs, Andrew, Bonino Da Silva Santos, Luiz Olavo, Belien, Jeroen, Weistra, Ward, Azevedo, Rita, van Bochove, Kees, Meijer, Gerrit, Boiten, Jan-Willem, Rambla, Jordi, Fijneman, Remond, Spalding, J. Dylan, Abeln, Sanne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000Research 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5657030/
https://www.ncbi.nlm.nih.gov/pubmed/29123641
http://dx.doi.org/10.12688/f1000research.12168.1
_version_ 1783273806066876416
author Zhang, Chao
Bijlard, Jochem
Staiger, Christine
Scollen, Serena
van Enckevort, David
Hoogstrate, Youri
Senf, Alexander
Hiltemann, Saskia
Repo, Susanna
Pipping, Wibo
Bierkens, Mariska
Payralbe, Stefan
Stringer, Bas
Heringa, Jaap
Stubbs, Andrew
Bonino Da Silva Santos, Luiz Olavo
Belien, Jeroen
Weistra, Ward
Azevedo, Rita
van Bochove, Kees
Meijer, Gerrit
Boiten, Jan-Willem
Rambla, Jordi
Fijneman, Remond
Spalding, J. Dylan
Abeln, Sanne
author_facet Zhang, Chao
Bijlard, Jochem
Staiger, Christine
Scollen, Serena
van Enckevort, David
Hoogstrate, Youri
Senf, Alexander
Hiltemann, Saskia
Repo, Susanna
Pipping, Wibo
Bierkens, Mariska
Payralbe, Stefan
Stringer, Bas
Heringa, Jaap
Stubbs, Andrew
Bonino Da Silva Santos, Luiz Olavo
Belien, Jeroen
Weistra, Ward
Azevedo, Rita
van Bochove, Kees
Meijer, Gerrit
Boiten, Jan-Willem
Rambla, Jordi
Fijneman, Remond
Spalding, J. Dylan
Abeln, Sanne
author_sort Zhang, Chao
collection PubMed
description The availability of high-throughput molecular profiling techniques has provided more accurate and informative data for regular clinical studies. Nevertheless, complex computational workflows are required to interpret these data. Over the past years, the data volume has been growing explosively, requiring robust human data management to organise and integrate the data efficiently. For this reason, we set up an ELIXIR implementation study, together with the Translational research IT (TraIT) programme, to design a data ecosystem that is able to link raw and interpreted data. In this project, the data from the TraIT Cell Line Use Case (TraIT-CLUC) are used as a test case for this system. Within this ecosystem, we use the European Genome-phenome Archive (EGA) to store raw molecular profiling data; tranSMART to collect interpreted molecular profiling data and clinical data for corresponding samples; and Galaxy to store, run and manage the computational workflows. We can integrate these data by linking their repositories systematically. To showcase our design, we have structured the TraIT-CLUC data, which contain a variety of molecular profiling data types, for storage in both tranSMART and EGA. The metadata provided allows referencing between tranSMART and EGA, fulfilling the cycle of data submission and discovery; we have also designed a data flow from EGA to Galaxy, enabling reanalysis of the raw data in Galaxy. In this way, users can select patient cohorts in tranSMART, trace them back to the raw data and perform (re)analysis in Galaxy. Our conclusion is that the majority of metadata does not necessarily need to be stored (redundantly) in both databases, but that instead FAIR persistent identifiers should be available for well-defined data ontology levels: study, data access committee, physical sample, data sample and raw data file. This approach will pave the way for the stable linkage and reuse of data.
format Online
Article
Text
id pubmed-5657030
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher F1000Research
record_format MEDLINE/PubMed
spelling pubmed-56570302017-11-08 Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data Zhang, Chao Bijlard, Jochem Staiger, Christine Scollen, Serena van Enckevort, David Hoogstrate, Youri Senf, Alexander Hiltemann, Saskia Repo, Susanna Pipping, Wibo Bierkens, Mariska Payralbe, Stefan Stringer, Bas Heringa, Jaap Stubbs, Andrew Bonino Da Silva Santos, Luiz Olavo Belien, Jeroen Weistra, Ward Azevedo, Rita van Bochove, Kees Meijer, Gerrit Boiten, Jan-Willem Rambla, Jordi Fijneman, Remond Spalding, J. Dylan Abeln, Sanne F1000Res Method Article The availability of high-throughput molecular profiling techniques has provided more accurate and informative data for regular clinical studies. Nevertheless, complex computational workflows are required to interpret these data. Over the past years, the data volume has been growing explosively, requiring robust human data management to organise and integrate the data efficiently. For this reason, we set up an ELIXIR implementation study, together with the Translational research IT (TraIT) programme, to design a data ecosystem that is able to link raw and interpreted data. In this project, the data from the TraIT Cell Line Use Case (TraIT-CLUC) are used as a test case for this system. Within this ecosystem, we use the European Genome-phenome Archive (EGA) to store raw molecular profiling data; tranSMART to collect interpreted molecular profiling data and clinical data for corresponding samples; and Galaxy to store, run and manage the computational workflows. We can integrate these data by linking their repositories systematically. To showcase our design, we have structured the TraIT-CLUC data, which contain a variety of molecular profiling data types, for storage in both tranSMART and EGA. The metadata provided allows referencing between tranSMART and EGA, fulfilling the cycle of data submission and discovery; we have also designed a data flow from EGA to Galaxy, enabling reanalysis of the raw data in Galaxy. In this way, users can select patient cohorts in tranSMART, trace them back to the raw data and perform (re)analysis in Galaxy. Our conclusion is that the majority of metadata does not necessarily need to be stored (redundantly) in both databases, but that instead FAIR persistent identifiers should be available for well-defined data ontology levels: study, data access committee, physical sample, data sample and raw data file. This approach will pave the way for the stable linkage and reuse of data. F1000Research 2017-08-16 /pmc/articles/PMC5657030/ /pubmed/29123641 http://dx.doi.org/10.12688/f1000research.12168.1 Text en Copyright: © 2017 Zhang C et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Method Article
Zhang, Chao
Bijlard, Jochem
Staiger, Christine
Scollen, Serena
van Enckevort, David
Hoogstrate, Youri
Senf, Alexander
Hiltemann, Saskia
Repo, Susanna
Pipping, Wibo
Bierkens, Mariska
Payralbe, Stefan
Stringer, Bas
Heringa, Jaap
Stubbs, Andrew
Bonino Da Silva Santos, Luiz Olavo
Belien, Jeroen
Weistra, Ward
Azevedo, Rita
van Bochove, Kees
Meijer, Gerrit
Boiten, Jan-Willem
Rambla, Jordi
Fijneman, Remond
Spalding, J. Dylan
Abeln, Sanne
Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data
title Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data
title_full Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data
title_fullStr Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data
title_full_unstemmed Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data
title_short Systematically linking tranSMART, Galaxy and EGA for reusing human translational research data
title_sort systematically linking transmart, galaxy and ega for reusing human translational research data
topic Method Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5657030/
https://www.ncbi.nlm.nih.gov/pubmed/29123641
http://dx.doi.org/10.12688/f1000research.12168.1
work_keys_str_mv AT zhangchao systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT bijlardjochem systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT staigerchristine systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT scollenserena systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT vanenckevortdavid systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT hoogstrateyouri systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT senfalexander systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT hiltemannsaskia systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT reposusanna systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT pippingwibo systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT bierkensmariska systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT payralbestefan systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT stringerbas systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT heringajaap systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT stubbsandrew systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT boninodasilvasantosluizolavo systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT belienjeroen systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT weistraward systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT azevedorita systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT vanbochovekees systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT meijergerrit systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT boitenjanwillem systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT ramblajordi systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT fijnemanremond systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT spaldingjdylan systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata
AT abelnsanne systematicallylinkingtransmartgalaxyandegaforreusinghumantranslationalresearchdata