Cargando…

MeSH Up: effective MeSH text classification for improved document retrieval

Motivation: Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and the Gene Ontology (GO) provide an efficient way of accessing and organizing biomedical information by reducing the ambiguity inherent to free-text data. Different methods of automating the assignment of MeS...

Descripción completa

Detalles Bibliográficos
Autores principales: Trieschnigg, Dolf, Pezik, Piotr, Lee, Vivian, de Jong, Franciska, Kraaij, Wessel, Rebholz-Schuhmann, Dietrich
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2682526/
https://www.ncbi.nlm.nih.gov/pubmed/19376821
http://dx.doi.org/10.1093/bioinformatics/btp249
_version_ 1782167067513847808
author Trieschnigg, Dolf
Pezik, Piotr
Lee, Vivian
de Jong, Franciska
Kraaij, Wessel
Rebholz-Schuhmann, Dietrich
author_facet Trieschnigg, Dolf
Pezik, Piotr
Lee, Vivian
de Jong, Franciska
Kraaij, Wessel
Rebholz-Schuhmann, Dietrich
author_sort Trieschnigg, Dolf
collection PubMed
description Motivation: Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and the Gene Ontology (GO) provide an efficient way of accessing and organizing biomedical information by reducing the ambiguity inherent to free-text data. Different methods of automating the assignment of MeSH concepts have been proposed to replace manual annotation, but they are either limited to a small subset of MeSH or have only been compared with a limited number of other systems. Results: We compare the performance of six MeSH classification systems [MetaMap, EAGL, a language and a vector space model-based approach, a K-Nearest Neighbor (KNN) approach and MTI] in terms of reproducing and complementing manual MeSH annotations. A KNN system clearly outperforms the other published approaches and scales well with large amounts of text using the full MeSH thesaurus. Our measurements demonstrate to what extent manual MeSH annotations can be reproduced and how they can be complemented by automatic annotations. We also show that a statistically significant improvement can be obtained in information retrieval (IR) when the text of a user's query is automatically annotated with MeSH concepts, compared to using the original textual query alone. Conclusions: The annotation of biomedical texts using controlled vocabularies such as MeSH can be automated to improve text-only IR. Furthermore, the automatic MeSH annotation system we propose is highly scalable and it generates improvements in IR comparable with those observed for manual annotations. Contact: trieschn@ewi.utwente.nl Supplementary information: Supplementary data are available at Bioinformatics online.
format Text
id pubmed-2682526
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-26825262009-05-15 MeSH Up: effective MeSH text classification for improved document retrieval Trieschnigg, Dolf Pezik, Piotr Lee, Vivian de Jong, Franciska Kraaij, Wessel Rebholz-Schuhmann, Dietrich Bioinformatics Original Papers Motivation: Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and the Gene Ontology (GO) provide an efficient way of accessing and organizing biomedical information by reducing the ambiguity inherent to free-text data. Different methods of automating the assignment of MeSH concepts have been proposed to replace manual annotation, but they are either limited to a small subset of MeSH or have only been compared with a limited number of other systems. Results: We compare the performance of six MeSH classification systems [MetaMap, EAGL, a language and a vector space model-based approach, a K-Nearest Neighbor (KNN) approach and MTI] in terms of reproducing and complementing manual MeSH annotations. A KNN system clearly outperforms the other published approaches and scales well with large amounts of text using the full MeSH thesaurus. Our measurements demonstrate to what extent manual MeSH annotations can be reproduced and how they can be complemented by automatic annotations. We also show that a statistically significant improvement can be obtained in information retrieval (IR) when the text of a user's query is automatically annotated with MeSH concepts, compared to using the original textual query alone. Conclusions: The annotation of biomedical texts using controlled vocabularies such as MeSH can be automated to improve text-only IR. Furthermore, the automatic MeSH annotation system we propose is highly scalable and it generates improvements in IR comparable with those observed for manual annotations. Contact: trieschn@ewi.utwente.nl Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2009-06-01 2009-04-17 /pmc/articles/PMC2682526/ /pubmed/19376821 http://dx.doi.org/10.1093/bioinformatics/btp249 Text en © 2009 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Trieschnigg, Dolf
Pezik, Piotr
Lee, Vivian
de Jong, Franciska
Kraaij, Wessel
Rebholz-Schuhmann, Dietrich
MeSH Up: effective MeSH text classification for improved document retrieval
title MeSH Up: effective MeSH text classification for improved document retrieval
title_full MeSH Up: effective MeSH text classification for improved document retrieval
title_fullStr MeSH Up: effective MeSH text classification for improved document retrieval
title_full_unstemmed MeSH Up: effective MeSH text classification for improved document retrieval
title_short MeSH Up: effective MeSH text classification for improved document retrieval
title_sort mesh up: effective mesh text classification for improved document retrieval
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2682526/
https://www.ncbi.nlm.nih.gov/pubmed/19376821
http://dx.doi.org/10.1093/bioinformatics/btp249
work_keys_str_mv AT trieschniggdolf meshupeffectivemeshtextclassificationforimproveddocumentretrieval
AT pezikpiotr meshupeffectivemeshtextclassificationforimproveddocumentretrieval
AT leevivian meshupeffectivemeshtextclassificationforimproveddocumentretrieval
AT dejongfranciska meshupeffectivemeshtextclassificationforimproveddocumentretrieval
AT kraaijwessel meshupeffectivemeshtextclassificationforimproveddocumentretrieval
AT rebholzschuhmanndietrich meshupeffectivemeshtextclassificationforimproveddocumentretrieval