Cargando…

Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions

Detailed query histories often contain a precise picture of a person’s life, including sensitive and personally identifiable information. As sanitization of such logs is an unsolved research problem, commercial Web search engines that possess large datasets of this kind at their disposal refrain fro...

Descripción completa

Detalles Bibliográficos
Autores principales: Biega, Asia J., Schmidt, Jana, Roy, Rishiraj Saha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148029/
http://dx.doi.org/10.1007/978-3-030-45442-5_14
_version_ 1783520515210608640
author Biega, Asia J.
Schmidt, Jana
Roy, Rishiraj Saha
author_facet Biega, Asia J.
Schmidt, Jana
Roy, Rishiraj Saha
author_sort Biega, Asia J.
collection PubMed
description Detailed query histories often contain a precise picture of a person’s life, including sensitive and personally identifiable information. As sanitization of such logs is an unsolved research problem, commercial Web search engines that possess large datasets of this kind at their disposal refrain from disseminating them to the wider research community. Ironically, studies examining privacy in search often require detailed search logs with user profiles. This paper builds on an observation that information needs are also expressed in the form of questions in online Community Question Answering (CQA) communities. We take a step towards understanding the process of formulating queries from questions to form a basis for automatic derivation of search logs from CQA forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the StackExchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We also release a dataset of 7,000 question-query pairs from our study.
format Online
Article
Text
id pubmed-7148029
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-71480292020-04-13 Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions Biega, Asia J. Schmidt, Jana Roy, Rishiraj Saha Advances in Information Retrieval Article Detailed query histories often contain a precise picture of a person’s life, including sensitive and personally identifiable information. As sanitization of such logs is an unsolved research problem, commercial Web search engines that possess large datasets of this kind at their disposal refrain from disseminating them to the wider research community. Ironically, studies examining privacy in search often require detailed search logs with user profiles. This paper builds on an observation that information needs are also expressed in the form of questions in online Community Question Answering (CQA) communities. We take a step towards understanding the process of formulating queries from questions to form a basis for automatic derivation of search logs from CQA forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the StackExchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We also release a dataset of 7,000 question-query pairs from our study. 2020-03-24 /pmc/articles/PMC7148029/ http://dx.doi.org/10.1007/978-3-030-45442-5_14 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Biega, Asia J.
Schmidt, Jana
Roy, Rishiraj Saha
Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions
title Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions
title_full Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions
title_fullStr Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions
title_full_unstemmed Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions
title_short Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions
title_sort towards query logs for privacy studies: on deriving search queries from questions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148029/
http://dx.doi.org/10.1007/978-3-030-45442-5_14
work_keys_str_mv AT biegaasiaj towardsquerylogsforprivacystudiesonderivingsearchqueriesfromquestions
AT schmidtjana towardsquerylogsforprivacystudiesonderivingsearchqueriesfromquestions
AT royrishirajsaha towardsquerylogsforprivacystudiesonderivingsearchqueriesfromquestions