Cargando…

MeSH: a window into full text for document summarization

Motivation: Previous research in the biomedical text-mining domain has historically been limited to titles, abstracts and metadata available in MEDLINE records. Recent research initiatives such as TREC Genomics and BioCreAtIvE strongly point to the merits of moving beyond abstracts and into the real...

Descripción completa

Detalles Bibliográficos
Autores principales: Bhattacharya, Sanmitra, Ha−Thuc, Viet, Srinivasan, Padmini
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117369/
https://www.ncbi.nlm.nih.gov/pubmed/21685060
http://dx.doi.org/10.1093/bioinformatics/btr223
_version_ 1782206324107378688
author Bhattacharya, Sanmitra
Ha−Thuc, Viet
Srinivasan, Padmini
author_facet Bhattacharya, Sanmitra
Ha−Thuc, Viet
Srinivasan, Padmini
author_sort Bhattacharya, Sanmitra
collection PubMed
description Motivation: Previous research in the biomedical text-mining domain has historically been limited to titles, abstracts and metadata available in MEDLINE records. Recent research initiatives such as TREC Genomics and BioCreAtIvE strongly point to the merits of moving beyond abstracts and into the realm of full texts. Full texts are, however, more expensive to process not only in terms of resources needed but also in terms of accuracy. Since full texts contain embellishments that elaborate, contextualize, contrast, supplement, etc., there is greater risk for false positives. Motivated by this, we explore an approach that offers a compromise between the extremes of abstracts and full texts. Specifically, we create reduced versions of full text documents that contain only important portions. In the long-term, our goal is to explore the use of such summaries for functions such as document retrieval and information extraction. Here, we focus on designing summarization strategies. In particular, we explore the use of MeSH terms, manually assigned to documents by trained annotators, as clues to select important text segments from the full text documents. Results: Our experiments confirm the ability of our approach to pick the important text portions. Using the ROUGE measures for evaluation, we were able to achieve maximum ROUGE-1, ROUGE-2 and ROUGE-SU4 F-scores of 0.4150, 0.1435 and 0.1782, respectively, for our MeSH term-based method versus the maximum baseline scores of 0.3815, 0.1353 and 0.1428, respectively. Using a MeSH profile-based strategy, we were able to achieve maximum ROUGE F-scores of 0.4320, 0.1497 and 0.1887, respectively. Human evaluation of the baselines and our proposed strategies further corroborates the ability of our method to select important sentences from the full texts. Contact: sanmitra-bhattacharya@uiowa.edu; padmini-srinivasan@uiowa.edu
format Online
Article
Text
id pubmed-3117369
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-31173692011-06-17 MeSH: a window into full text for document summarization Bhattacharya, Sanmitra Ha−Thuc, Viet Srinivasan, Padmini Bioinformatics Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria Motivation: Previous research in the biomedical text-mining domain has historically been limited to titles, abstracts and metadata available in MEDLINE records. Recent research initiatives such as TREC Genomics and BioCreAtIvE strongly point to the merits of moving beyond abstracts and into the realm of full texts. Full texts are, however, more expensive to process not only in terms of resources needed but also in terms of accuracy. Since full texts contain embellishments that elaborate, contextualize, contrast, supplement, etc., there is greater risk for false positives. Motivated by this, we explore an approach that offers a compromise between the extremes of abstracts and full texts. Specifically, we create reduced versions of full text documents that contain only important portions. In the long-term, our goal is to explore the use of such summaries for functions such as document retrieval and information extraction. Here, we focus on designing summarization strategies. In particular, we explore the use of MeSH terms, manually assigned to documents by trained annotators, as clues to select important text segments from the full text documents. Results: Our experiments confirm the ability of our approach to pick the important text portions. Using the ROUGE measures for evaluation, we were able to achieve maximum ROUGE-1, ROUGE-2 and ROUGE-SU4 F-scores of 0.4150, 0.1435 and 0.1782, respectively, for our MeSH term-based method versus the maximum baseline scores of 0.3815, 0.1353 and 0.1428, respectively. Using a MeSH profile-based strategy, we were able to achieve maximum ROUGE F-scores of 0.4320, 0.1497 and 0.1887, respectively. Human evaluation of the baselines and our proposed strategies further corroborates the ability of our method to select important sentences from the full texts. Contact: sanmitra-bhattacharya@uiowa.edu; padmini-srinivasan@uiowa.edu Oxford University Press 2011-07-01 2011-06-14 /pmc/articles/PMC3117369/ /pubmed/21685060 http://dx.doi.org/10.1093/bioinformatics/btr223 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria
Bhattacharya, Sanmitra
Ha−Thuc, Viet
Srinivasan, Padmini
MeSH: a window into full text for document summarization
title MeSH: a window into full text for document summarization
title_full MeSH: a window into full text for document summarization
title_fullStr MeSH: a window into full text for document summarization
title_full_unstemmed MeSH: a window into full text for document summarization
title_short MeSH: a window into full text for document summarization
title_sort mesh: a window into full text for document summarization
topic Ismb/Eccb 2011 Proceedings Papers Committee July 17 to July 19, 2011, Vienna, Austria
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3117369/
https://www.ncbi.nlm.nih.gov/pubmed/21685060
http://dx.doi.org/10.1093/bioinformatics/btr223
work_keys_str_mv AT bhattacharyasanmitra meshawindowintofulltextfordocumentsummarization
AT hathucviet meshawindowintofulltextfordocumentsummarization
AT srinivasanpadmini meshawindowintofulltextfordocumentsummarization