Cargando…
Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions
Detailed query histories often contain a precise picture of a person’s life, including sensitive and personally identifiable information. As sanitization of such logs is an unsolved research problem, commercial Web search engines that possess large datasets of this kind at their disposal refrain fro...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148029/ http://dx.doi.org/10.1007/978-3-030-45442-5_14 |
_version_ | 1783520515210608640 |
---|---|
author | Biega, Asia J. Schmidt, Jana Roy, Rishiraj Saha |
author_facet | Biega, Asia J. Schmidt, Jana Roy, Rishiraj Saha |
author_sort | Biega, Asia J. |
collection | PubMed |
description | Detailed query histories often contain a precise picture of a person’s life, including sensitive and personally identifiable information. As sanitization of such logs is an unsolved research problem, commercial Web search engines that possess large datasets of this kind at their disposal refrain from disseminating them to the wider research community. Ironically, studies examining privacy in search often require detailed search logs with user profiles. This paper builds on an observation that information needs are also expressed in the form of questions in online Community Question Answering (CQA) communities. We take a step towards understanding the process of formulating queries from questions to form a basis for automatic derivation of search logs from CQA forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the StackExchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We also release a dataset of 7,000 question-query pairs from our study. |
format | Online Article Text |
id | pubmed-7148029 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-71480292020-04-13 Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions Biega, Asia J. Schmidt, Jana Roy, Rishiraj Saha Advances in Information Retrieval Article Detailed query histories often contain a precise picture of a person’s life, including sensitive and personally identifiable information. As sanitization of such logs is an unsolved research problem, commercial Web search engines that possess large datasets of this kind at their disposal refrain from disseminating them to the wider research community. Ironically, studies examining privacy in search often require detailed search logs with user profiles. This paper builds on an observation that information needs are also expressed in the form of questions in online Community Question Answering (CQA) communities. We take a step towards understanding the process of formulating queries from questions to form a basis for automatic derivation of search logs from CQA forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the StackExchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We also release a dataset of 7,000 question-query pairs from our study. 2020-03-24 /pmc/articles/PMC7148029/ http://dx.doi.org/10.1007/978-3-030-45442-5_14 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Biega, Asia J. Schmidt, Jana Roy, Rishiraj Saha Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions |
title | Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions |
title_full | Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions |
title_fullStr | Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions |
title_full_unstemmed | Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions |
title_short | Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions |
title_sort | towards query logs for privacy studies: on deriving search queries from questions |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148029/ http://dx.doi.org/10.1007/978-3-030-45442-5_14 |
work_keys_str_mv | AT biegaasiaj towardsquerylogsforprivacystudiesonderivingsearchqueriesfromquestions AT schmidtjana towardsquerylogsforprivacystudiesonderivingsearchqueriesfromquestions AT royrishirajsaha towardsquerylogsforprivacystudiesonderivingsearchqueriesfromquestions |