Cargando…

The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature

BACKGROUND: One of the most neglected areas of biomedical Text Mining (TM) is the development of systems based on carefully assessed user needs. We have recently investigated the user needs of an important task yet to be tackled by TM -- Cancer Risk Assessment (CRA). Here we take the first step towa...

Descripción completa

Detalles Bibliográficos
Autores principales:	Korhonen, Anna, Silins, Ilona, Sun, Lin, Stenius, Ulla
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2759963/ https://www.ncbi.nlm.nih.gov/pubmed/19772619 http://dx.doi.org/10.1186/1471-2105-10-303

_version_	1782172715989336064
author	Korhonen, Anna Silins, Ilona Sun, Lin Stenius, Ulla
author_facet	Korhonen, Anna Silins, Ilona Sun, Lin Stenius, Ulla
author_sort	Korhonen, Anna
collection	PubMed
description	BACKGROUND: One of the most neglected areas of biomedical Text Mining (TM) is the development of systems based on carefully assessed user needs. We have recently investigated the user needs of an important task yet to be tackled by TM -- Cancer Risk Assessment (CRA). Here we take the first step towards the development of TM technology for the task: identifying and organizing the scientific evidence required for CRA in a taxonomy which is capable of supporting extensive data gathering from biomedical literature. RESULTS: The taxonomy is based on expert annotation of 1297 abstracts downloaded from relevant PubMed journals. It classifies 1742 unique keywords found in the corpus to 48 classes which specify core evidence required for CRA. We report promising results with inter-annotator agreement tests and automatic classification of PubMed abstracts to taxonomy classes. A simple user test is also reported in a near real-world CRA scenario which demonstrates along with other evaluation that the resources we have built are well-defined, accurate, and applicable in practice. CONCLUSION: We present our annotation guidelines and a tool which we have designed for expert annotation of PubMed abstracts. A corpus annotated for keywords and document relevance is also presented, along with the taxonomy which organizes the keywords into classes defining core evidence for CRA. As demonstrated by the evaluation, the materials we have constructed provide a good basis for classification of CRA literature along multiple dimensions. They can support current manual CRA as well as facilitate the development of an approach based on TM. We discuss extending the taxonomy further via manual and machine learning approaches and the subsequent steps required to develop TM technology for the needs of CRA.
format	Text
id	pubmed-2759963
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27599632009-10-11 The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature Korhonen, Anna Silins, Ilona Sun, Lin Stenius, Ulla BMC Bioinformatics Research Article BACKGROUND: One of the most neglected areas of biomedical Text Mining (TM) is the development of systems based on carefully assessed user needs. We have recently investigated the user needs of an important task yet to be tackled by TM -- Cancer Risk Assessment (CRA). Here we take the first step towards the development of TM technology for the task: identifying and organizing the scientific evidence required for CRA in a taxonomy which is capable of supporting extensive data gathering from biomedical literature. RESULTS: The taxonomy is based on expert annotation of 1297 abstracts downloaded from relevant PubMed journals. It classifies 1742 unique keywords found in the corpus to 48 classes which specify core evidence required for CRA. We report promising results with inter-annotator agreement tests and automatic classification of PubMed abstracts to taxonomy classes. A simple user test is also reported in a near real-world CRA scenario which demonstrates along with other evaluation that the resources we have built are well-defined, accurate, and applicable in practice. CONCLUSION: We present our annotation guidelines and a tool which we have designed for expert annotation of PubMed abstracts. A corpus annotated for keywords and document relevance is also presented, along with the taxonomy which organizes the keywords into classes defining core evidence for CRA. As demonstrated by the evaluation, the materials we have constructed provide a good basis for classification of CRA literature along multiple dimensions. They can support current manual CRA as well as facilitate the development of an approach based on TM. We discuss extending the taxonomy further via manual and machine learning approaches and the subsequent steps required to develop TM technology for the needs of CRA. BioMed Central 2009-09-22 /pmc/articles/PMC2759963/ /pubmed/19772619 http://dx.doi.org/10.1186/1471-2105-10-303 Text en Copyright © 2009 Korhonen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Korhonen, Anna Silins, Ilona Sun, Lin Stenius, Ulla The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature
title	The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature
title_full	The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature
title_fullStr	The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature
title_full_unstemmed	The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature
title_short	The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature
title_sort	first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2759963/ https://www.ncbi.nlm.nih.gov/pubmed/19772619 http://dx.doi.org/10.1186/1471-2105-10-303
work_keys_str_mv	AT korhonenanna thefirststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature AT silinsilona thefirststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature AT sunlin thefirststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature AT steniusulla thefirststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature AT korhonenanna firststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature AT silinsilona firststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature AT sunlin firststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature AT steniusulla firststepinthedevelopmentoftextminingtechnologyforcancerriskassessmentidentifyingandorganizingscientificevidenceinriskassessmentliterature

The first step in the development of text mining technology for cancer risk assessment: identifying and organizing scientific evidence in risk assessment literature

Ejemplares similares