Cargando…

A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment

BACKGROUND: Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the results or conclusions of the study in question. Several schemes have been developed to characterize such information in scientific journal articles. F...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Yufan, Korhonen, Anna, Liakata, Maria, Silins, Ilona, Hogberg, Johan, Stenius, Ulla
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3060841/
https://www.ncbi.nlm.nih.gov/pubmed/21385430
http://dx.doi.org/10.1186/1471-2105-12-69
_version_ 1782200548120854528
author Guo, Yufan
Korhonen, Anna
Liakata, Maria
Silins, Ilona
Hogberg, Johan
Stenius, Ulla
author_facet Guo, Yufan
Korhonen, Anna
Liakata, Maria
Silins, Ilona
Hogberg, Johan
Stenius, Ulla
author_sort Guo, Yufan
collection PubMed
description BACKGROUND: Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the results or conclusions of the study in question. Several schemes have been developed to characterize such information in scientific journal articles. For example, a simple section-based scheme assigns individual sentences in abstracts under sections such as Objective, Methods, Results and Conclusions. Some schemes of textual information structure have proved useful for biomedical text mining (BIO-TM) tasks (e.g. automatic summarization). However, user-centered evaluation in the context of real-life tasks has been lacking. METHODS: We take three schemes of different type and granularity - those based on section names, Argumentative Zones (AZ) and Core Scientific Concepts (CoreSC) - and evaluate their usefulness for a real-life task which focuses on biomedical abstracts: Cancer Risk Assessment (CRA). We annotate a corpus of CRA abstracts according to each scheme, develop classifiers for automatic identification of the schemes in abstracts, and evaluate both the manual and automatic classifications directly as well as in the context of CRA. RESULTS: Our results show that for each scheme, the majority of categories appear in abstracts, although two of the schemes (AZ and CoreSC) were developed originally for full journal articles. All the schemes can be identified in abstracts relatively reliably using machine learning. Moreover, when cancer risk assessors are presented with scheme annotated abstracts, they find relevant information significantly faster than when presented with unannotated abstracts, even when the annotations are produced using an automatic classifier. Interestingly, in this user-based evaluation the coarse-grained scheme based on section names proved nearly as useful for CRA as the finest-grained CoreSC scheme. CONCLUSIONS: We have shown that existing schemes aimed at capturing information structure of scientific documents can be applied to biomedical abstracts and can be identified in them automatically with an accuracy which is high enough to benefit a real-life task in biomedicine.
format Text
id pubmed-3060841
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30608412011-03-19 A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment Guo, Yufan Korhonen, Anna Liakata, Maria Silins, Ilona Hogberg, Johan Stenius, Ulla BMC Bioinformatics Research Article BACKGROUND: Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the results or conclusions of the study in question. Several schemes have been developed to characterize such information in scientific journal articles. For example, a simple section-based scheme assigns individual sentences in abstracts under sections such as Objective, Methods, Results and Conclusions. Some schemes of textual information structure have proved useful for biomedical text mining (BIO-TM) tasks (e.g. automatic summarization). However, user-centered evaluation in the context of real-life tasks has been lacking. METHODS: We take three schemes of different type and granularity - those based on section names, Argumentative Zones (AZ) and Core Scientific Concepts (CoreSC) - and evaluate their usefulness for a real-life task which focuses on biomedical abstracts: Cancer Risk Assessment (CRA). We annotate a corpus of CRA abstracts according to each scheme, develop classifiers for automatic identification of the schemes in abstracts, and evaluate both the manual and automatic classifications directly as well as in the context of CRA. RESULTS: Our results show that for each scheme, the majority of categories appear in abstracts, although two of the schemes (AZ and CoreSC) were developed originally for full journal articles. All the schemes can be identified in abstracts relatively reliably using machine learning. Moreover, when cancer risk assessors are presented with scheme annotated abstracts, they find relevant information significantly faster than when presented with unannotated abstracts, even when the annotations are produced using an automatic classifier. Interestingly, in this user-based evaluation the coarse-grained scheme based on section names proved nearly as useful for CRA as the finest-grained CoreSC scheme. CONCLUSIONS: We have shown that existing schemes aimed at capturing information structure of scientific documents can be applied to biomedical abstracts and can be identified in them automatically with an accuracy which is high enough to benefit a real-life task in biomedicine. BioMed Central 2011-03-08 /pmc/articles/PMC3060841/ /pubmed/21385430 http://dx.doi.org/10.1186/1471-2105-12-69 Text en Copyright ©2011 Guo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Guo, Yufan
Korhonen, Anna
Liakata, Maria
Silins, Ilona
Hogberg, Johan
Stenius, Ulla
A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment
title A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment
title_full A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment
title_fullStr A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment
title_full_unstemmed A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment
title_short A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment
title_sort comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3060841/
https://www.ncbi.nlm.nih.gov/pubmed/21385430
http://dx.doi.org/10.1186/1471-2105-12-69
work_keys_str_mv AT guoyufan acomparisonanduserbasedevaluationofmodelsoftextualinformationstructureinthecontextofcancerriskassessment
AT korhonenanna acomparisonanduserbasedevaluationofmodelsoftextualinformationstructureinthecontextofcancerriskassessment
AT liakatamaria acomparisonanduserbasedevaluationofmodelsoftextualinformationstructureinthecontextofcancerriskassessment
AT silinsilona acomparisonanduserbasedevaluationofmodelsoftextualinformationstructureinthecontextofcancerriskassessment
AT hogbergjohan acomparisonanduserbasedevaluationofmodelsoftextualinformationstructureinthecontextofcancerriskassessment
AT steniusulla acomparisonanduserbasedevaluationofmodelsoftextualinformationstructureinthecontextofcancerriskassessment
AT guoyufan comparisonanduserbasedevaluationofmodelsoftextualinformationstructureinthecontextofcancerriskassessment
AT korhonenanna comparisonanduserbasedevaluationofmodelsoftextualinformationstructureinthecontextofcancerriskassessment
AT liakatamaria comparisonanduserbasedevaluationofmodelsoftextualinformationstructureinthecontextofcancerriskassessment
AT silinsilona comparisonanduserbasedevaluationofmodelsoftextualinformationstructureinthecontextofcancerriskassessment
AT hogbergjohan comparisonanduserbasedevaluationofmodelsoftextualinformationstructureinthecontextofcancerriskassessment
AT steniusulla comparisonanduserbasedevaluationofmodelsoftextualinformationstructureinthecontextofcancerriskassessment