Cargando…

Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries

BACKGROUND: Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive...

Descripción completa

Detalles Bibliográficos
Autores principales: Bader, Judith L, Theofanos, Mary Frances
Formato: Texto
Lenguaje:English
Publicado: Gunther Eysenbach 2003
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1550578/
https://www.ncbi.nlm.nih.gov/pubmed/14713659
http://dx.doi.org/10.2196/jmir.5.4.e31
_version_ 1782129238489432064
author Bader, Judith L
Theofanos, Mary Frances
author_facet Bader, Judith L
Theofanos, Mary Frances
author_sort Bader, Judith L
collection PubMed
description BACKGROUND: Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. OBJECTIVE: To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. METHODS: The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared ≥ 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. RESULTS: Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. CONCLUSIONS: Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience.
format Text
id pubmed-1550578
institution National Center for Biotechnology Information
language English
publishDate 2003
publisher Gunther Eysenbach
record_format MEDLINE/PubMed
spelling pubmed-15505782006-10-13 Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries Bader, Judith L Theofanos, Mary Frances J Med Internet Res Original Paper BACKGROUND: Searching for health information is one of the most-common tasks performed by Internet users. Many users begin searching on popular search engines rather than on prominent health information sites. We know that many visitors to our (National Cancer Institute) Web site, cancer.gov, arrive via links in search engine result. OBJECTIVE: To learn more about the specific needs of our general-public users, we wanted to understand what lay users really wanted to know about cancer, how they phrased their questions, and how much detail they used. METHODS: The National Cancer Institute partnered with AskJeeves, Inc to develop a methodology to capture, sample, and analyze 3 months of cancer-related queries on the Ask.com Web site, a prominent United States consumer search engine, which receives over 35 million queries per week. Using a benchmark set of 500 terms and word roots supplied by the National Cancer Institute, AskJeeves identified a test sample of cancer queries for 1 week in August 2001. From these 500 terms only 37 appeared ≥ 5 times/day over the trial test week in 17208 queries. Using these 37 terms, 204165 instances of cancer queries were found in the Ask.com query logs for the actual test period of June-August 2001. Of these, 7500 individual user questions were randomly selected for detailed analysis and assigned to appropriate categories. The exact language of sample queries is presented. RESULTS: Considering multiples of the same questions, the sample of 7500 individual user queries represented 76077 queries (37% of the total 3-month pool). Overall 78.37% of sampled Cancer queries asked about 14 specific cancer types. Within each cancer type, queries were sorted into appropriate subcategories including at least the following: General Information, Symptoms, Diagnosis and Testing, Treatment, Statistics, Definition, and Cause/Risk/Link. The most-common specific cancer types mentioned in queries were Digestive/Gastrointestinal/Bowel (15.0%), Breast (11.7%), Skin (11.3%), and Genitourinary (10.5%). Additional subcategories of queries about specific cancer types varied, depending on user input. Queries that were not specific to a cancer type were also tracked and categorized. CONCLUSIONS: Natural-language searching affords users the opportunity to fully express their information needs and can aid users naïve to the content and vocabulary. The specific queries analyzed for this study reflect news and research studies reported during the study dates and would surely change with different study dates. Analyzing queries from search engines represents one way of knowing what kinds of content to provide to users of a given Web site. Users ask questions using whole sentences and keywords, often misspelling words. Providing the option for natural-language searching does not obviate the need for good information architecture, usability engineering, and user testing in order to optimize user experience. Gunther Eysenbach 2003-12-11 /pmc/articles/PMC1550578/ /pubmed/14713659 http://dx.doi.org/10.2196/jmir.5.4.e31 Text en © Judith L Bader, Mary Frances Theofanos. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 11.12.2003. Except where otherwise noted, articles published in the Journal of Medical Internet Research are distributed under the terms of the Creative Commons Attribution License (http://www.creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited, including full bibliographic details and the URL (see "please cite as" above), and this statement is included.
spellingShingle Original Paper
Bader, Judith L
Theofanos, Mary Frances
Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries
title Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries
title_full Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries
title_fullStr Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries
title_full_unstemmed Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries
title_short Searching for Cancer Information on the Internet: Analyzing Natural Language Search Queries
title_sort searching for cancer information on the internet: analyzing natural language search queries
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1550578/
https://www.ncbi.nlm.nih.gov/pubmed/14713659
http://dx.doi.org/10.2196/jmir.5.4.e31
work_keys_str_mv AT baderjudithl searchingforcancerinformationontheinternetanalyzingnaturallanguagesearchqueries
AT theofanosmaryfrances searchingforcancerinformationontheinternetanalyzingnaturallanguagesearchqueries