Cargando…

Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search

BACKGROUND: Data discovery, particularly the discovery of key variables and their inter-relationships, is key to secondary data analysis, and in-turn, the evolving field of data science. Interface designers have presumed that their users are domain experts, and so they have provided complex interfac...

Descripción completa

Detalles Bibliográficos
Autores principales: Jay, Caroline, Harper, Simon, Dunlop, Ian, Smith, Sam, Sufi, Shoaib, Goble, Carole, Buchan, Iain
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications Inc. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4731680/
https://www.ncbi.nlm.nih.gov/pubmed/26769334
http://dx.doi.org/10.2196/jmir.4912
_version_ 1782412575764381696
author Jay, Caroline
Harper, Simon
Dunlop, Ian
Smith, Sam
Sufi, Shoaib
Goble, Carole
Buchan, Iain
author_facet Jay, Caroline
Harper, Simon
Dunlop, Ian
Smith, Sam
Sufi, Shoaib
Goble, Carole
Buchan, Iain
author_sort Jay, Caroline
collection PubMed
description BACKGROUND: Data discovery, particularly the discovery of key variables and their inter-relationships, is key to secondary data analysis, and in-turn, the evolving field of data science. Interface designers have presumed that their users are domain experts, and so they have provided complex interfaces to support these “experts.” Such interfaces hark back to a time when searches needed to be accurate first time as there was a high computational cost associated with each search. Our work is part of a governmental research initiative between the medical and social research funding bodies to improve the use of social data in medical research. OBJECTIVE: The cross-disciplinary nature of data science can make no assumptions regarding the domain expertise of a particular scientist, whose interests may intersect multiple domains. Here we consider the common requirement for scientists to seek archived data for secondary analysis. This has more in common with search needs of the “Google generation” than with their single-domain, single-tool forebears. Our study compares a Google-like interface with traditional ways of searching for noncomplex health data in a data archive. METHODS: Two user interfaces are evaluated for the same set of tasks in extracting data from surveys stored in the UK Data Archive (UKDA). One interface, Web search, is “Google-like,” enabling users to browse, search for, and view metadata about study variables, whereas the other, traditional search, has standard multioption user interface. RESULTS: Using a comprehensive set of tasks with 20 volunteers, we found that the Web search interface met data discovery needs and expectations better than the traditional search. A task × interface repeated measures analysis showed a main effect indicating that answers found through the Web search interface were more likely to be correct (F (1,19)=37.3, P<.001), with a main effect of task (F (3,57)=6.3, P<.001). Further, participants completed the task significantly faster using the Web search interface (F (1,19)=18.0, P<.001). There was also a main effect of task (F (2,38)=4.1, P=.025, Greenhouse-Geisser correction applied). Overall, participants were asked to rate learnability, ease of use, and satisfaction. Paired mean comparisons showed that the Web search interface received significantly higher ratings than the traditional search interface for learnability (P=.002, 95% CI [0.6-2.4]), ease of use (P<.001, 95% CI [1.2-3.2]), and satisfaction (P<.001, 95% CI [1.8-3.5]). The results show superior cross-domain usability of Web search, which is consistent with its general familiarity and with enabling queries to be refined as the search proceeds, which treats serendipity as part of the refinement. CONCLUSIONS: The results provide clear evidence that data science should adopt single-field natural language search interfaces for variable search supporting in particular: query reformulation; data browsing; faceted search; surrogates; relevance feedback; summarization, analytics, and visual presentation.
format Online
Article
Text
id pubmed-4731680
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher JMIR Publications Inc.
record_format MEDLINE/PubMed
spelling pubmed-47316802016-02-16 Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search Jay, Caroline Harper, Simon Dunlop, Ian Smith, Sam Sufi, Shoaib Goble, Carole Buchan, Iain J Med Internet Res Original Paper BACKGROUND: Data discovery, particularly the discovery of key variables and their inter-relationships, is key to secondary data analysis, and in-turn, the evolving field of data science. Interface designers have presumed that their users are domain experts, and so they have provided complex interfaces to support these “experts.” Such interfaces hark back to a time when searches needed to be accurate first time as there was a high computational cost associated with each search. Our work is part of a governmental research initiative between the medical and social research funding bodies to improve the use of social data in medical research. OBJECTIVE: The cross-disciplinary nature of data science can make no assumptions regarding the domain expertise of a particular scientist, whose interests may intersect multiple domains. Here we consider the common requirement for scientists to seek archived data for secondary analysis. This has more in common with search needs of the “Google generation” than with their single-domain, single-tool forebears. Our study compares a Google-like interface with traditional ways of searching for noncomplex health data in a data archive. METHODS: Two user interfaces are evaluated for the same set of tasks in extracting data from surveys stored in the UK Data Archive (UKDA). One interface, Web search, is “Google-like,” enabling users to browse, search for, and view metadata about study variables, whereas the other, traditional search, has standard multioption user interface. RESULTS: Using a comprehensive set of tasks with 20 volunteers, we found that the Web search interface met data discovery needs and expectations better than the traditional search. A task × interface repeated measures analysis showed a main effect indicating that answers found through the Web search interface were more likely to be correct (F (1,19)=37.3, P<.001), with a main effect of task (F (3,57)=6.3, P<.001). Further, participants completed the task significantly faster using the Web search interface (F (1,19)=18.0, P<.001). There was also a main effect of task (F (2,38)=4.1, P=.025, Greenhouse-Geisser correction applied). Overall, participants were asked to rate learnability, ease of use, and satisfaction. Paired mean comparisons showed that the Web search interface received significantly higher ratings than the traditional search interface for learnability (P=.002, 95% CI [0.6-2.4]), ease of use (P<.001, 95% CI [1.2-3.2]), and satisfaction (P<.001, 95% CI [1.8-3.5]). The results show superior cross-domain usability of Web search, which is consistent with its general familiarity and with enabling queries to be refined as the search proceeds, which treats serendipity as part of the refinement. CONCLUSIONS: The results provide clear evidence that data science should adopt single-field natural language search interfaces for variable search supporting in particular: query reformulation; data browsing; faceted search; surrogates; relevance feedback; summarization, analytics, and visual presentation. JMIR Publications Inc. 2016-01-14 /pmc/articles/PMC4731680/ /pubmed/26769334 http://dx.doi.org/10.2196/jmir.4912 Text en ©Caroline Jay, Simon Harper, Ian Dunlop, Sam Smith, Shoaib Sufi, Carole Goble, Iain Buchan. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 14.01.2016. https://creativecommons.org/licenses/by/2.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/ (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Jay, Caroline
Harper, Simon
Dunlop, Ian
Smith, Sam
Sufi, Shoaib
Goble, Carole
Buchan, Iain
Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search
title Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search
title_full Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search
title_fullStr Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search
title_full_unstemmed Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search
title_short Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search
title_sort natural language search interfaces: health data needs single-field variable search
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4731680/
https://www.ncbi.nlm.nih.gov/pubmed/26769334
http://dx.doi.org/10.2196/jmir.4912
work_keys_str_mv AT jaycaroline naturallanguagesearchinterfaceshealthdataneedssinglefieldvariablesearch
AT harpersimon naturallanguagesearchinterfaceshealthdataneedssinglefieldvariablesearch
AT dunlopian naturallanguagesearchinterfaceshealthdataneedssinglefieldvariablesearch
AT smithsam naturallanguagesearchinterfaceshealthdataneedssinglefieldvariablesearch
AT sufishoaib naturallanguagesearchinterfaceshealthdataneedssinglefieldvariablesearch
AT goblecarole naturallanguagesearchinterfaceshealthdataneedssinglefieldvariablesearch
AT buchaniain naturallanguagesearchinterfaceshealthdataneedssinglefieldvariablesearch