Cargando…

Making Statistical Data More Easily Accessible on the Web: Results of the StatSearch Case Study

In this paper we present the results of the StatSearch case study that aimed at providing an enhanced access to statistical data available on the Web. In the scope of this case study we developed a prototype of an information access tool combining a query-based search engine with semi-automated navi...

Descripción completa

Detalles Bibliográficos
Autores principales: Rajman, M, Vesely, M, Boynton, I M, Fridlund, B, Fyhrlund, A, Sundgren, B, Lundquist, P, Thelander, H, Wänerskär, M
Lenguaje:eng
Publicado: 2005
Materias:
Acceso en línea:http://cds.cern.ch/record/896654
_version_ 1780908615639498752
author Rajman, M
Vesely, M
Boynton, I M
Fridlund, B
Fyhrlund, A
Sundgren, B
Lundquist, P
Thelander, H
Wänerskär, M
author_facet Rajman, M
Vesely, M
Boynton, I M
Fridlund, B
Fyhrlund, A
Sundgren, B
Lundquist, P
Thelander, H
Wänerskär, M
author_sort Rajman, M
collection CERN
description In this paper we present the results of the StatSearch case study that aimed at providing an enhanced access to statistical data available on the Web. In the scope of this case study we developed a prototype of an information access tool combining a query-based search engine with semi-automated navigation techniques exploiting the hierarchical structuring of the available data. This tool enables a better control of the information retrieval, improving the quality and ease of the access to statistical information. The central part of the presented StatSearch tool consists in the design of an algorithm for automated navigation through a tree-like hierarchical document structure. The algorithm relies on the computation of query related relevance score distributions over the available database to identify the most relevant clusters in the data structure. These most relevant clusters are then proposed to the user for navigation, or, alternatively, are the support for the automated navigation process. Several approaches to the automation of the navigation are compared and Natural Language Processing techniques allowing more precise and coherent computation of textual similarities are briefly described. The resulting StatSearch prototype was evaluated by the Swedish Statistical Office (SCB) on a sample of over 5000 English documents accessible through the SCB web site. The evaluation was based on supervised on-site usability testing and aimed at identification of the prototype's potentials and its added value with respect to information access as objectively perceived by users. The evaluation method and obtained results are presented. The case study was carried out in the framework of the NEMIS Network of Excellence in Text Mining and Its Applications in Statistics (IST-2001-37574).
id cern-896654
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2005
record_format invenio
spelling cern-8966542019-09-30T06:29:59Zhttp://cds.cern.ch/record/896654engRajman, MVesely, MBoynton, I MFridlund, BFyhrlund, ASundgren, BLundquist, PThelander, HWänerskär, MMaking Statistical Data More Easily Accessible on the Web: Results of the StatSearch Case StudyInformation Transfer and ManagementIn this paper we present the results of the StatSearch case study that aimed at providing an enhanced access to statistical data available on the Web. In the scope of this case study we developed a prototype of an information access tool combining a query-based search engine with semi-automated navigation techniques exploiting the hierarchical structuring of the available data. This tool enables a better control of the information retrieval, improving the quality and ease of the access to statistical information. The central part of the presented StatSearch tool consists in the design of an algorithm for automated navigation through a tree-like hierarchical document structure. The algorithm relies on the computation of query related relevance score distributions over the available database to identify the most relevant clusters in the data structure. These most relevant clusters are then proposed to the user for navigation, or, alternatively, are the support for the automated navigation process. Several approaches to the automation of the navigation are compared and Natural Language Processing techniques allowing more precise and coherent computation of textual similarities are briefly described. The resulting StatSearch prototype was evaluated by the Swedish Statistical Office (SCB) on a sample of over 5000 English documents accessible through the SCB web site. The evaluation was based on supervised on-site usability testing and aimed at identification of the prototype's potentials and its added value with respect to information access as objectively perceived by users. The evaluation method and obtained results are presented. The case study was carried out in the framework of the NEMIS Network of Excellence in Text Mining and Its Applications in Statistics (IST-2001-37574).CERN-OPEN-2005-022oai:cds.cern.ch:8966542005
spellingShingle Information Transfer and Management
Rajman, M
Vesely, M
Boynton, I M
Fridlund, B
Fyhrlund, A
Sundgren, B
Lundquist, P
Thelander, H
Wänerskär, M
Making Statistical Data More Easily Accessible on the Web: Results of the StatSearch Case Study
title Making Statistical Data More Easily Accessible on the Web: Results of the StatSearch Case Study
title_full Making Statistical Data More Easily Accessible on the Web: Results of the StatSearch Case Study
title_fullStr Making Statistical Data More Easily Accessible on the Web: Results of the StatSearch Case Study
title_full_unstemmed Making Statistical Data More Easily Accessible on the Web: Results of the StatSearch Case Study
title_short Making Statistical Data More Easily Accessible on the Web: Results of the StatSearch Case Study
title_sort making statistical data more easily accessible on the web: results of the statsearch case study
topic Information Transfer and Management
url http://cds.cern.ch/record/896654
work_keys_str_mv AT rajmanm makingstatisticaldatamoreeasilyaccessibleonthewebresultsofthestatsearchcasestudy
AT veselym makingstatisticaldatamoreeasilyaccessibleonthewebresultsofthestatsearchcasestudy
AT boyntonim makingstatisticaldatamoreeasilyaccessibleonthewebresultsofthestatsearchcasestudy
AT fridlundb makingstatisticaldatamoreeasilyaccessibleonthewebresultsofthestatsearchcasestudy
AT fyhrlunda makingstatisticaldatamoreeasilyaccessibleonthewebresultsofthestatsearchcasestudy
AT sundgrenb makingstatisticaldatamoreeasilyaccessibleonthewebresultsofthestatsearchcasestudy
AT lundquistp makingstatisticaldatamoreeasilyaccessibleonthewebresultsofthestatsearchcasestudy
AT thelanderh makingstatisticaldatamoreeasilyaccessibleonthewebresultsofthestatsearchcasestudy
AT wanerskarm makingstatisticaldatamoreeasilyaccessibleonthewebresultsofthestatsearchcasestudy