Cargando…

A Systematic Approach to Configuring MetaMap for Optimal Performance

Background  MetaMap is a valuable tool for processing biomedical texts to identify concepts. Although MetaMap is highly configurative, configuration decisions are not straightforward. Objective  To develop a systematic, data-driven methodology for configuring MetaMap for optimal performance. Methods...

Descripción completa

Detalles Bibliográficos
Autores principales: Jing, Xia, Indani, Akash, Hubig, Nina, Min, Hua, Gong, Yang, Cimino, James J., Sittig, Dean F., Rennert, Lior, Robinson, David, Biondich, Paul, Wright, Adam, Nøhr, Christian, Law, Timothy, Faxvaag, Arild, Gimbel, Ronald
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Georg Thieme Verlag KG 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9788913/
https://www.ncbi.nlm.nih.gov/pubmed/35613942
http://dx.doi.org/10.1055/a-1862-0421
_version_ 1784858859517509632
author Jing, Xia
Indani, Akash
Hubig, Nina
Min, Hua
Gong, Yang
Cimino, James J.
Sittig, Dean F.
Rennert, Lior
Robinson, David
Biondich, Paul
Wright, Adam
Nøhr, Christian
Law, Timothy
Faxvaag, Arild
Gimbel, Ronald
author_facet Jing, Xia
Indani, Akash
Hubig, Nina
Min, Hua
Gong, Yang
Cimino, James J.
Sittig, Dean F.
Rennert, Lior
Robinson, David
Biondich, Paul
Wright, Adam
Nøhr, Christian
Law, Timothy
Faxvaag, Arild
Gimbel, Ronald
author_sort Jing, Xia
collection PubMed
description Background  MetaMap is a valuable tool for processing biomedical texts to identify concepts. Although MetaMap is highly configurative, configuration decisions are not straightforward. Objective  To develop a systematic, data-driven methodology for configuring MetaMap for optimal performance. Methods  MetaMap, the word2vec model, and the phrase model were used to build a pipeline. For unsupervised training, the phrase and word2vec models used abstracts related to clinical decision support as input. During testing, MetaMap was configured with the default option, one behavior option, and two behavior options. For each configuration, cosine and soft cosine similarity scores between identified entities and gold-standard terms were computed for 40 annotated abstracts (422 sentences). The similarity scores were used to calculate and compare the overall percentages of exact matches, similar matches, and missing gold-standard terms among the abstracts for each configuration. The results were manually spot-checked. The precision, recall, and F-measure ( β =1) were calculated. Results  The percentages of exact matches and missing gold-standard terms were 0.6–0.79 and 0.09–0.3 for one behavior option, and 0.56–0.8 and 0.09–0.3 for two behavior options, respectively. The percentages of exact matches and missing terms for soft cosine similarity scores exceeded those for cosine similarity scores. The average precision, recall, and F-measure were 0.59, 0.82, and 0.68 for exact matches, and 1.00, 0.53, and 0.69 for missing terms, respectively. Conclusion  We demonstrated a systematic approach that provides objective and accurate evidence guiding MetaMap configurations for optimizing performance. Combining objective evidence and the current practice of using principles, experience, and intuitions outperforms a single strategy in MetaMap configurations. Our methodology, reference codes, measurements, results, and workflow are valuable references for optimizing and configuring MetaMap.
format Online
Article
Text
id pubmed-9788913
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Georg Thieme Verlag KG
record_format MEDLINE/PubMed
spelling pubmed-97889132022-12-24 A Systematic Approach to Configuring MetaMap for Optimal Performance Jing, Xia Indani, Akash Hubig, Nina Min, Hua Gong, Yang Cimino, James J. Sittig, Dean F. Rennert, Lior Robinson, David Biondich, Paul Wright, Adam Nøhr, Christian Law, Timothy Faxvaag, Arild Gimbel, Ronald Methods Inf Med Background  MetaMap is a valuable tool for processing biomedical texts to identify concepts. Although MetaMap is highly configurative, configuration decisions are not straightforward. Objective  To develop a systematic, data-driven methodology for configuring MetaMap for optimal performance. Methods  MetaMap, the word2vec model, and the phrase model were used to build a pipeline. For unsupervised training, the phrase and word2vec models used abstracts related to clinical decision support as input. During testing, MetaMap was configured with the default option, one behavior option, and two behavior options. For each configuration, cosine and soft cosine similarity scores between identified entities and gold-standard terms were computed for 40 annotated abstracts (422 sentences). The similarity scores were used to calculate and compare the overall percentages of exact matches, similar matches, and missing gold-standard terms among the abstracts for each configuration. The results were manually spot-checked. The precision, recall, and F-measure ( β =1) were calculated. Results  The percentages of exact matches and missing gold-standard terms were 0.6–0.79 and 0.09–0.3 for one behavior option, and 0.56–0.8 and 0.09–0.3 for two behavior options, respectively. The percentages of exact matches and missing terms for soft cosine similarity scores exceeded those for cosine similarity scores. The average precision, recall, and F-measure were 0.59, 0.82, and 0.68 for exact matches, and 1.00, 0.53, and 0.69 for missing terms, respectively. Conclusion  We demonstrated a systematic approach that provides objective and accurate evidence guiding MetaMap configurations for optimizing performance. Combining objective evidence and the current practice of using principles, experience, and intuitions outperforms a single strategy in MetaMap configurations. Our methodology, reference codes, measurements, results, and workflow are valuable references for optimizing and configuring MetaMap. Georg Thieme Verlag KG 2022-09-19 /pmc/articles/PMC9788913/ /pubmed/35613942 http://dx.doi.org/10.1055/a-1862-0421 Text en The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. ( https://creativecommons.org/licenses/by-nc-nd/4.0/ ) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License, which permits unrestricted reproduction and distribution, for non-commercial purposes only; and use and reproduction, but not distribution, of adapted material for non-commercial purposes only, provided the original work is properly cited.
spellingShingle Jing, Xia
Indani, Akash
Hubig, Nina
Min, Hua
Gong, Yang
Cimino, James J.
Sittig, Dean F.
Rennert, Lior
Robinson, David
Biondich, Paul
Wright, Adam
Nøhr, Christian
Law, Timothy
Faxvaag, Arild
Gimbel, Ronald
A Systematic Approach to Configuring MetaMap for Optimal Performance
title A Systematic Approach to Configuring MetaMap for Optimal Performance
title_full A Systematic Approach to Configuring MetaMap for Optimal Performance
title_fullStr A Systematic Approach to Configuring MetaMap for Optimal Performance
title_full_unstemmed A Systematic Approach to Configuring MetaMap for Optimal Performance
title_short A Systematic Approach to Configuring MetaMap for Optimal Performance
title_sort systematic approach to configuring metamap for optimal performance
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9788913/
https://www.ncbi.nlm.nih.gov/pubmed/35613942
http://dx.doi.org/10.1055/a-1862-0421
work_keys_str_mv AT jingxia asystematicapproachtoconfiguringmetamapforoptimalperformance
AT indaniakash asystematicapproachtoconfiguringmetamapforoptimalperformance
AT hubignina asystematicapproachtoconfiguringmetamapforoptimalperformance
AT minhua asystematicapproachtoconfiguringmetamapforoptimalperformance
AT gongyang asystematicapproachtoconfiguringmetamapforoptimalperformance
AT ciminojamesj asystematicapproachtoconfiguringmetamapforoptimalperformance
AT sittigdeanf asystematicapproachtoconfiguringmetamapforoptimalperformance
AT rennertlior asystematicapproachtoconfiguringmetamapforoptimalperformance
AT robinsondavid asystematicapproachtoconfiguringmetamapforoptimalperformance
AT biondichpaul asystematicapproachtoconfiguringmetamapforoptimalperformance
AT wrightadam asystematicapproachtoconfiguringmetamapforoptimalperformance
AT nøhrchristian asystematicapproachtoconfiguringmetamapforoptimalperformance
AT lawtimothy asystematicapproachtoconfiguringmetamapforoptimalperformance
AT faxvaagarild asystematicapproachtoconfiguringmetamapforoptimalperformance
AT gimbelronald asystematicapproachtoconfiguringmetamapforoptimalperformance
AT jingxia systematicapproachtoconfiguringmetamapforoptimalperformance
AT indaniakash systematicapproachtoconfiguringmetamapforoptimalperformance
AT hubignina systematicapproachtoconfiguringmetamapforoptimalperformance
AT minhua systematicapproachtoconfiguringmetamapforoptimalperformance
AT gongyang systematicapproachtoconfiguringmetamapforoptimalperformance
AT ciminojamesj systematicapproachtoconfiguringmetamapforoptimalperformance
AT sittigdeanf systematicapproachtoconfiguringmetamapforoptimalperformance
AT rennertlior systematicapproachtoconfiguringmetamapforoptimalperformance
AT robinsondavid systematicapproachtoconfiguringmetamapforoptimalperformance
AT biondichpaul systematicapproachtoconfiguringmetamapforoptimalperformance
AT wrightadam systematicapproachtoconfiguringmetamapforoptimalperformance
AT nøhrchristian systematicapproachtoconfiguringmetamapforoptimalperformance
AT lawtimothy systematicapproachtoconfiguringmetamapforoptimalperformance
AT faxvaagarild systematicapproachtoconfiguringmetamapforoptimalperformance
AT gimbelronald systematicapproachtoconfiguringmetamapforoptimalperformance