Cargando…

RES2/406: Making Complex Datasets Available over the Web

INTRODUCTION: The internet is the (current) ideal medium for sharing simple data: but the tools for describing complicated datasets, and the ethics and resulting technology for sharing confidential data are less well understood. METHODS: I first describe a simple dataset we've put on the web -...

Descripción completa

Detalles Bibliográficos
Autor principal:	Walker, N
Formato:	Texto
Lenguaje:	English
Publicado:	Gunther Eysenbach 1999
Materias:	Abstract
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1761761/ http://dx.doi.org/10.2196/jmir.1.suppl1.e78

_version_	1782131470925561856
author	Walker, N
author_facet	Walker, N
author_sort	Walker, N
collection	PubMed
description	INTRODUCTION: The internet is the (current) ideal medium for sharing simple data: but the tools for describing complicated datasets, and the ethics and resulting technology for sharing confidential data are less well understood. METHODS: I first describe a simple dataset we've put on the web - some of the world's first genome screen data. The data is anonymous; there was full subject consent; there is no foreseeable subject harm/benefit from data release; and the data sets are in a form readily understood by scientists working in the field. I then describe a large-scale longitudinal epidemiological study, and the tools used to make this comprehensible to secondary data users - the main innovation being a searchable data dictionary and interactive decision support for selecting data subsets from the multi-thousand variable whole. Thirdly I describe the current data access arrangements - "good enough" anonymity, and ftp access for signed-up collaborators. Lastly I describe fully-functioning experimental alternatives: aggregated tables (generated with reference to the data dictionary) and raw data access for named collaborators via encryption, the web's HTTPS protocol using Secure Socket Layers. RESULTS: Datasets can be shared via the Web, however complex or confidential. For a simple (but important) dataset, see: http://www.mrc-bsu.cam.ac.uk/MSgenetics/ . For a complex dataset and support tools, see: http://www.mrc-bsu.cam.ac.uk/cfas/ or https:// www.mrc-bsu.cam.ac.uk/cfas/. This currently uses US-export (i.e. weak) levels of encryption. DISCUSSION: There is increasing pressure (from, for example, the Medical Research Council in the UK) to share data collected during publicly-funded medical research. While the social sciences have shared data for many years via archive sites, "patient confidentiality" has prevented it in the medical world. Ironically, the increased use of biological samples- which require far greater stress on confidentiality and the anonymity of public records - have led to proposals for public databases of, and potential competition for, these scarce, expensive resources. For social sciences, record anonymisation is the stripping of identifiers, but they also rely on the fierce legalese of "undertaking forms" to prevent subject identification. This model is breaking down with linked genotypic/phenotypic data - where it might become hugely financially worthwhile to identify a study subject. The data dictionary approach - adopted as an aid to understanding a large complex dataset, can also be used to generate anonymised subsets of the data, and aggregated tables live on the Web. However, full access will require the newer, secure web protocols - if we can find the political and financial will to buy it in from the States.
format	Text
id	pubmed-1761761
institution	National Center for Biotechnology Information
language	English
publishDate	1999
publisher	Gunther Eysenbach
record_format	MEDLINE/PubMed
spelling	pubmed-17617612007-01-03 RES2/406: Making Complex Datasets Available over the Web Walker, N J Med Internet Res Abstract INTRODUCTION: The internet is the (current) ideal medium for sharing simple data: but the tools for describing complicated datasets, and the ethics and resulting technology for sharing confidential data are less well understood. METHODS: I first describe a simple dataset we've put on the web - some of the world's first genome screen data. The data is anonymous; there was full subject consent; there is no foreseeable subject harm/benefit from data release; and the data sets are in a form readily understood by scientists working in the field. I then describe a large-scale longitudinal epidemiological study, and the tools used to make this comprehensible to secondary data users - the main innovation being a searchable data dictionary and interactive decision support for selecting data subsets from the multi-thousand variable whole. Thirdly I describe the current data access arrangements - "good enough" anonymity, and ftp access for signed-up collaborators. Lastly I describe fully-functioning experimental alternatives: aggregated tables (generated with reference to the data dictionary) and raw data access for named collaborators via encryption, the web's HTTPS protocol using Secure Socket Layers. RESULTS: Datasets can be shared via the Web, however complex or confidential. For a simple (but important) dataset, see: http://www.mrc-bsu.cam.ac.uk/MSgenetics/ . For a complex dataset and support tools, see: http://www.mrc-bsu.cam.ac.uk/cfas/ or https:// www.mrc-bsu.cam.ac.uk/cfas/. This currently uses US-export (i.e. weak) levels of encryption. DISCUSSION: There is increasing pressure (from, for example, the Medical Research Council in the UK) to share data collected during publicly-funded medical research. While the social sciences have shared data for many years via archive sites, "patient confidentiality" has prevented it in the medical world. Ironically, the increased use of biological samples- which require far greater stress on confidentiality and the anonymity of public records - have led to proposals for public databases of, and potential competition for, these scarce, expensive resources. For social sciences, record anonymisation is the stripping of identifiers, but they also rely on the fierce legalese of "undertaking forms" to prevent subject identification. This model is breaking down with linked genotypic/phenotypic data - where it might become hugely financially worthwhile to identify a study subject. The data dictionary approach - adopted as an aid to understanding a large complex dataset, can also be used to generate anonymised subsets of the data, and aggregated tables live on the Web. However, full access will require the newer, secure web protocols - if we can find the political and financial will to buy it in from the States. Gunther Eysenbach 1999-09-19 /pmc/articles/PMC1761761/ http://dx.doi.org/10.2196/jmir.1.suppl1.e78 Text en Except where otherwise noted, articles published in the Journal of Medical Internet Research are distributed under the terms of the Creative Commons Attribution License (http://www.creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Abstract Walker, N RES2/406: Making Complex Datasets Available over the Web
title	RES2/406: Making Complex Datasets Available over the Web
title_full	RES2/406: Making Complex Datasets Available over the Web
title_fullStr	RES2/406: Making Complex Datasets Available over the Web
title_full_unstemmed	RES2/406: Making Complex Datasets Available over the Web
title_short	RES2/406: Making Complex Datasets Available over the Web
title_sort	res2/406: making complex datasets available over the web
topic	Abstract
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1761761/ http://dx.doi.org/10.2196/jmir.1.suppl1.e78
work_keys_str_mv	AT walkern res2406makingcomplexdatasetsavailableovertheweb

RES2/406: Making Complex Datasets Available over the Web

Ejemplares similares