Cargando…
Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder
BACKGROUND: Natural language processing (NLP) tools can facilitate the extraction of biomedical concepts from unstructured free texts, such as research articles or clinical notes. The NLP software tools CLAMP, cTAKES, and MetaMap are among the most widely used tools to extract biomedical concept ent...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7772897/ https://www.ncbi.nlm.nih.gov/pubmed/33380331 http://dx.doi.org/10.1186/s12911-020-01352-2 |
_version_ | 1783629959936344064 |
---|---|
author | Peng, Jacqueline Zhao, Mengge Havrilla, James Liu, Cong Weng, Chunhua Guthrie, Whitney Schultz, Robert Wang, Kai Zhou, Yunyun |
author_facet | Peng, Jacqueline Zhao, Mengge Havrilla, James Liu, Cong Weng, Chunhua Guthrie, Whitney Schultz, Robert Wang, Kai Zhou, Yunyun |
author_sort | Peng, Jacqueline |
collection | PubMed |
description | BACKGROUND: Natural language processing (NLP) tools can facilitate the extraction of biomedical concepts from unstructured free texts, such as research articles or clinical notes. The NLP software tools CLAMP, cTAKES, and MetaMap are among the most widely used tools to extract biomedical concept entities. However, their performance in extracting disease-specific terminology from literature has not been compared extensively, especially for complex neuropsychiatric disorders with a diverse set of phenotypic and clinical manifestations. METHODS: We comparatively evaluated these NLP tools using autism spectrum disorder (ASD) as a case study. We collected 827 ASD-related terms based on previous literature as the benchmark list for performance evaluation. Then, we applied CLAMP, cTAKES, and MetaMap on 544 full-text articles and 20,408 abstracts from PubMed to extract ASD-related terms. We evaluated the predictive performance using precision, recall, and F1 score. RESULTS: We found that CLAMP has the best performance in terms of F1 score followed by cTAKES and then MetaMap. Our results show that CLAMP has much higher precision than cTAKES and MetaMap, while cTAKES and MetaMap have higher recall than CLAMP. CONCLUSION: The analysis protocols used in this study can be applied to other neuropsychiatric or neurodevelopmental disorders that lack well-defined terminology sets to describe their phenotypic presentations. |
format | Online Article Text |
id | pubmed-7772897 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-77728972020-12-30 Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder Peng, Jacqueline Zhao, Mengge Havrilla, James Liu, Cong Weng, Chunhua Guthrie, Whitney Schultz, Robert Wang, Kai Zhou, Yunyun BMC Med Inform Decis Mak Research BACKGROUND: Natural language processing (NLP) tools can facilitate the extraction of biomedical concepts from unstructured free texts, such as research articles or clinical notes. The NLP software tools CLAMP, cTAKES, and MetaMap are among the most widely used tools to extract biomedical concept entities. However, their performance in extracting disease-specific terminology from literature has not been compared extensively, especially for complex neuropsychiatric disorders with a diverse set of phenotypic and clinical manifestations. METHODS: We comparatively evaluated these NLP tools using autism spectrum disorder (ASD) as a case study. We collected 827 ASD-related terms based on previous literature as the benchmark list for performance evaluation. Then, we applied CLAMP, cTAKES, and MetaMap on 544 full-text articles and 20,408 abstracts from PubMed to extract ASD-related terms. We evaluated the predictive performance using precision, recall, and F1 score. RESULTS: We found that CLAMP has the best performance in terms of F1 score followed by cTAKES and then MetaMap. Our results show that CLAMP has much higher precision than cTAKES and MetaMap, while cTAKES and MetaMap have higher recall than CLAMP. CONCLUSION: The analysis protocols used in this study can be applied to other neuropsychiatric or neurodevelopmental disorders that lack well-defined terminology sets to describe their phenotypic presentations. BioMed Central 2020-12-30 /pmc/articles/PMC7772897/ /pubmed/33380331 http://dx.doi.org/10.1186/s12911-020-01352-2 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Peng, Jacqueline Zhao, Mengge Havrilla, James Liu, Cong Weng, Chunhua Guthrie, Whitney Schultz, Robert Wang, Kai Zhou, Yunyun Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder |
title | Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder |
title_full | Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder |
title_fullStr | Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder |
title_full_unstemmed | Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder |
title_short | Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder |
title_sort | natural language processing (nlp) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7772897/ https://www.ncbi.nlm.nih.gov/pubmed/33380331 http://dx.doi.org/10.1186/s12911-020-01352-2 |
work_keys_str_mv | AT pengjacqueline naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder AT zhaomengge naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder AT havrillajames naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder AT liucong naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder AT wengchunhua naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder AT guthriewhitney naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder AT schultzrobert naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder AT wangkai naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder AT zhouyunyun naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder |