Cargando…

MeSH indexing based on automatically generated summaries

BACKGROUND: MEDLINE citations are manually indexed at the U.S. National Library of Medicine (NLM) using as reference the Medical Subject Headings (MeSH) controlled vocabulary. For this task, the human indexers read the full text of the article. Due to the growth of MEDLINE, the NLM Indexing Initiati...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jimeno-Yepes, Antonio J, Plaza, Laura, Mork, James G, Aronson, Alan R, Díaz, Alberto
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3706357/ https://www.ncbi.nlm.nih.gov/pubmed/23802936 http://dx.doi.org/10.1186/1471-2105-14-208

_version_	1782476543593807872
author	Jimeno-Yepes, Antonio J Plaza, Laura Mork, James G Aronson, Alan R Díaz, Alberto
author_facet	Jimeno-Yepes, Antonio J Plaza, Laura Mork, James G Aronson, Alan R Díaz, Alberto
author_sort	Jimeno-Yepes, Antonio J
collection	PubMed
description	BACKGROUND: MEDLINE citations are manually indexed at the U.S. National Library of Medicine (NLM) using as reference the Medical Subject Headings (MeSH) controlled vocabulary. For this task, the human indexers read the full text of the article. Due to the growth of MEDLINE, the NLM Indexing Initiative explores indexing methodologies that can support the task of the indexers. Medical Text Indexer (MTI) is a tool developed by the NLM Indexing Initiative to provide MeSH indexing recommendations to indexers. Currently, the input to MTI is MEDLINE citations, title and abstract only. Previous work has shown that using full text as input to MTI increases recall, but decreases precision sharply. We propose using summaries generated automatically from the full text for the input to MTI to use in the task of suggesting MeSH headings to indexers. Summaries distill the most salient information from the full text, which might increase the coverage of automatic indexing approaches based on MEDLINE. We hypothesize that if the results were good enough, manual indexers could possibly use automatic summaries instead of the full texts, along with the recommendations of MTI, to speed up the process while maintaining high quality of indexing results. RESULTS: We have generated summaries of different lengths using two different summarizers, and evaluated the MTI indexing on the summaries using different algorithms: MTI, individual MTI components, and machine learning. The results are compared to those of full text articles and MEDLINE citations. Our results show that automatically generated summaries achieve similar recall but higher precision compared to full text articles. Compared to MEDLINE citations, summaries achieve higher recall but lower precision. CONCLUSIONS: Our results show that automatic summaries produce better indexing than full text articles. Summaries produce similar recall to full text but much better precision, which seems to indicate that automatic summaries can efficiently capture the most important contents within the original articles. The combination of MEDLINE citations and automatically generated summaries could improve the recommendations suggested by MTI. On the other hand, indexing performance might be dependent on the MeSH heading being indexed. Summarization techniques could thus be considered as a feature selection algorithm that might have to be tuned individually for each MeSH heading.
format	Online Article Text
id	pubmed-3706357
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-37063572013-07-15 MeSH indexing based on automatically generated summaries Jimeno-Yepes, Antonio J Plaza, Laura Mork, James G Aronson, Alan R Díaz, Alberto BMC Bioinformatics Research Article BACKGROUND: MEDLINE citations are manually indexed at the U.S. National Library of Medicine (NLM) using as reference the Medical Subject Headings (MeSH) controlled vocabulary. For this task, the human indexers read the full text of the article. Due to the growth of MEDLINE, the NLM Indexing Initiative explores indexing methodologies that can support the task of the indexers. Medical Text Indexer (MTI) is a tool developed by the NLM Indexing Initiative to provide MeSH indexing recommendations to indexers. Currently, the input to MTI is MEDLINE citations, title and abstract only. Previous work has shown that using full text as input to MTI increases recall, but decreases precision sharply. We propose using summaries generated automatically from the full text for the input to MTI to use in the task of suggesting MeSH headings to indexers. Summaries distill the most salient information from the full text, which might increase the coverage of automatic indexing approaches based on MEDLINE. We hypothesize that if the results were good enough, manual indexers could possibly use automatic summaries instead of the full texts, along with the recommendations of MTI, to speed up the process while maintaining high quality of indexing results. RESULTS: We have generated summaries of different lengths using two different summarizers, and evaluated the MTI indexing on the summaries using different algorithms: MTI, individual MTI components, and machine learning. The results are compared to those of full text articles and MEDLINE citations. Our results show that automatically generated summaries achieve similar recall but higher precision compared to full text articles. Compared to MEDLINE citations, summaries achieve higher recall but lower precision. CONCLUSIONS: Our results show that automatic summaries produce better indexing than full text articles. Summaries produce similar recall to full text but much better precision, which seems to indicate that automatic summaries can efficiently capture the most important contents within the original articles. The combination of MEDLINE citations and automatically generated summaries could improve the recommendations suggested by MTI. On the other hand, indexing performance might be dependent on the MeSH heading being indexed. Summarization techniques could thus be considered as a feature selection algorithm that might have to be tuned individually for each MeSH heading. BioMed Central 2013-06-26 /pmc/articles/PMC3706357/ /pubmed/23802936 http://dx.doi.org/10.1186/1471-2105-14-208 Text en Copyright © 2013 Jimeno-Yepes et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Jimeno-Yepes, Antonio J Plaza, Laura Mork, James G Aronson, Alan R Díaz, Alberto MeSH indexing based on automatically generated summaries
title	MeSH indexing based on automatically generated summaries
title_full	MeSH indexing based on automatically generated summaries
title_fullStr	MeSH indexing based on automatically generated summaries
title_full_unstemmed	MeSH indexing based on automatically generated summaries
title_short	MeSH indexing based on automatically generated summaries
title_sort	mesh indexing based on automatically generated summaries
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3706357/ https://www.ncbi.nlm.nih.gov/pubmed/23802936 http://dx.doi.org/10.1186/1471-2105-14-208
work_keys_str_mv	AT jimenoyepesantonioj meshindexingbasedonautomaticallygeneratedsummaries AT plazalaura meshindexingbasedonautomaticallygeneratedsummaries AT morkjamesg meshindexingbasedonautomaticallygeneratedsummaries AT aronsonalanr meshindexingbasedonautomaticallygeneratedsummaries AT diazalberto meshindexingbasedonautomaticallygeneratedsummaries

MeSH indexing based on automatically generated summaries

Ejemplares similares