Cargando…
A Systematic Approach to Configuring MetaMap for Optimal Performance
Background MetaMap is a valuable tool for processing biomedical texts to identify concepts. Although MetaMap is highly configurative, configuration decisions are not straightforward. Objective To develop a systematic, data-driven methodology for configuring MetaMap for optimal performance. Methods...
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Georg Thieme Verlag KG
2022
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9788913/ https://www.ncbi.nlm.nih.gov/pubmed/35613942 http://dx.doi.org/10.1055/a-1862-0421 |
_version_ | 1784858859517509632 |
---|---|
author | Jing, Xia Indani, Akash Hubig, Nina Min, Hua Gong, Yang Cimino, James J. Sittig, Dean F. Rennert, Lior Robinson, David Biondich, Paul Wright, Adam Nøhr, Christian Law, Timothy Faxvaag, Arild Gimbel, Ronald |
author_facet | Jing, Xia Indani, Akash Hubig, Nina Min, Hua Gong, Yang Cimino, James J. Sittig, Dean F. Rennert, Lior Robinson, David Biondich, Paul Wright, Adam Nøhr, Christian Law, Timothy Faxvaag, Arild Gimbel, Ronald |
author_sort | Jing, Xia |
collection | PubMed |
description | Background MetaMap is a valuable tool for processing biomedical texts to identify concepts. Although MetaMap is highly configurative, configuration decisions are not straightforward. Objective To develop a systematic, data-driven methodology for configuring MetaMap for optimal performance. Methods MetaMap, the word2vec model, and the phrase model were used to build a pipeline. For unsupervised training, the phrase and word2vec models used abstracts related to clinical decision support as input. During testing, MetaMap was configured with the default option, one behavior option, and two behavior options. For each configuration, cosine and soft cosine similarity scores between identified entities and gold-standard terms were computed for 40 annotated abstracts (422 sentences). The similarity scores were used to calculate and compare the overall percentages of exact matches, similar matches, and missing gold-standard terms among the abstracts for each configuration. The results were manually spot-checked. The precision, recall, and F-measure ( β =1) were calculated. Results The percentages of exact matches and missing gold-standard terms were 0.6–0.79 and 0.09–0.3 for one behavior option, and 0.56–0.8 and 0.09–0.3 for two behavior options, respectively. The percentages of exact matches and missing terms for soft cosine similarity scores exceeded those for cosine similarity scores. The average precision, recall, and F-measure were 0.59, 0.82, and 0.68 for exact matches, and 1.00, 0.53, and 0.69 for missing terms, respectively. Conclusion We demonstrated a systematic approach that provides objective and accurate evidence guiding MetaMap configurations for optimizing performance. Combining objective evidence and the current practice of using principles, experience, and intuitions outperforms a single strategy in MetaMap configurations. Our methodology, reference codes, measurements, results, and workflow are valuable references for optimizing and configuring MetaMap. |
format | Online Article Text |
id | pubmed-9788913 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Georg Thieme Verlag KG |
record_format | MEDLINE/PubMed |
spelling | pubmed-97889132022-12-24 A Systematic Approach to Configuring MetaMap for Optimal Performance Jing, Xia Indani, Akash Hubig, Nina Min, Hua Gong, Yang Cimino, James J. Sittig, Dean F. Rennert, Lior Robinson, David Biondich, Paul Wright, Adam Nøhr, Christian Law, Timothy Faxvaag, Arild Gimbel, Ronald Methods Inf Med Background MetaMap is a valuable tool for processing biomedical texts to identify concepts. Although MetaMap is highly configurative, configuration decisions are not straightforward. Objective To develop a systematic, data-driven methodology for configuring MetaMap for optimal performance. Methods MetaMap, the word2vec model, and the phrase model were used to build a pipeline. For unsupervised training, the phrase and word2vec models used abstracts related to clinical decision support as input. During testing, MetaMap was configured with the default option, one behavior option, and two behavior options. For each configuration, cosine and soft cosine similarity scores between identified entities and gold-standard terms were computed for 40 annotated abstracts (422 sentences). The similarity scores were used to calculate and compare the overall percentages of exact matches, similar matches, and missing gold-standard terms among the abstracts for each configuration. The results were manually spot-checked. The precision, recall, and F-measure ( β =1) were calculated. Results The percentages of exact matches and missing gold-standard terms were 0.6–0.79 and 0.09–0.3 for one behavior option, and 0.56–0.8 and 0.09–0.3 for two behavior options, respectively. The percentages of exact matches and missing terms for soft cosine similarity scores exceeded those for cosine similarity scores. The average precision, recall, and F-measure were 0.59, 0.82, and 0.68 for exact matches, and 1.00, 0.53, and 0.69 for missing terms, respectively. Conclusion We demonstrated a systematic approach that provides objective and accurate evidence guiding MetaMap configurations for optimizing performance. Combining objective evidence and the current practice of using principles, experience, and intuitions outperforms a single strategy in MetaMap configurations. Our methodology, reference codes, measurements, results, and workflow are valuable references for optimizing and configuring MetaMap. Georg Thieme Verlag KG 2022-09-19 /pmc/articles/PMC9788913/ /pubmed/35613942 http://dx.doi.org/10.1055/a-1862-0421 Text en The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. ( https://creativecommons.org/licenses/by-nc-nd/4.0/ ) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License, which permits unrestricted reproduction and distribution, for non-commercial purposes only; and use and reproduction, but not distribution, of adapted material for non-commercial purposes only, provided the original work is properly cited. |
spellingShingle | Jing, Xia Indani, Akash Hubig, Nina Min, Hua Gong, Yang Cimino, James J. Sittig, Dean F. Rennert, Lior Robinson, David Biondich, Paul Wright, Adam Nøhr, Christian Law, Timothy Faxvaag, Arild Gimbel, Ronald A Systematic Approach to Configuring MetaMap for Optimal Performance |
title | A Systematic Approach to Configuring MetaMap for Optimal Performance |
title_full | A Systematic Approach to Configuring MetaMap for Optimal Performance |
title_fullStr | A Systematic Approach to Configuring MetaMap for Optimal Performance |
title_full_unstemmed | A Systematic Approach to Configuring MetaMap for Optimal Performance |
title_short | A Systematic Approach to Configuring MetaMap for Optimal Performance |
title_sort | systematic approach to configuring metamap for optimal performance |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9788913/ https://www.ncbi.nlm.nih.gov/pubmed/35613942 http://dx.doi.org/10.1055/a-1862-0421 |
work_keys_str_mv | AT jingxia asystematicapproachtoconfiguringmetamapforoptimalperformance AT indaniakash asystematicapproachtoconfiguringmetamapforoptimalperformance AT hubignina asystematicapproachtoconfiguringmetamapforoptimalperformance AT minhua asystematicapproachtoconfiguringmetamapforoptimalperformance AT gongyang asystematicapproachtoconfiguringmetamapforoptimalperformance AT ciminojamesj asystematicapproachtoconfiguringmetamapforoptimalperformance AT sittigdeanf asystematicapproachtoconfiguringmetamapforoptimalperformance AT rennertlior asystematicapproachtoconfiguringmetamapforoptimalperformance AT robinsondavid asystematicapproachtoconfiguringmetamapforoptimalperformance AT biondichpaul asystematicapproachtoconfiguringmetamapforoptimalperformance AT wrightadam asystematicapproachtoconfiguringmetamapforoptimalperformance AT nøhrchristian asystematicapproachtoconfiguringmetamapforoptimalperformance AT lawtimothy asystematicapproachtoconfiguringmetamapforoptimalperformance AT faxvaagarild asystematicapproachtoconfiguringmetamapforoptimalperformance AT gimbelronald asystematicapproachtoconfiguringmetamapforoptimalperformance AT jingxia systematicapproachtoconfiguringmetamapforoptimalperformance AT indaniakash systematicapproachtoconfiguringmetamapforoptimalperformance AT hubignina systematicapproachtoconfiguringmetamapforoptimalperformance AT minhua systematicapproachtoconfiguringmetamapforoptimalperformance AT gongyang systematicapproachtoconfiguringmetamapforoptimalperformance AT ciminojamesj systematicapproachtoconfiguringmetamapforoptimalperformance AT sittigdeanf systematicapproachtoconfiguringmetamapforoptimalperformance AT rennertlior systematicapproachtoconfiguringmetamapforoptimalperformance AT robinsondavid systematicapproachtoconfiguringmetamapforoptimalperformance AT biondichpaul systematicapproachtoconfiguringmetamapforoptimalperformance AT wrightadam systematicapproachtoconfiguringmetamapforoptimalperformance AT nøhrchristian systematicapproachtoconfiguringmetamapforoptimalperformance AT lawtimothy systematicapproachtoconfiguringmetamapforoptimalperformance AT faxvaagarild systematicapproachtoconfiguringmetamapforoptimalperformance AT gimbelronald systematicapproachtoconfiguringmetamapforoptimalperformance |