Cargando…

Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder

BACKGROUND: Natural language processing (NLP) tools can facilitate the extraction of biomedical concepts from unstructured free texts, such as research articles or clinical notes. The NLP software tools CLAMP, cTAKES, and MetaMap are among the most widely used tools to extract biomedical concept ent...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Jacqueline, Zhao, Mengge, Havrilla, James, Liu, Cong, Weng, Chunhua, Guthrie, Whitney, Schultz, Robert, Wang, Kai, Zhou, Yunyun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7772897/
https://www.ncbi.nlm.nih.gov/pubmed/33380331
http://dx.doi.org/10.1186/s12911-020-01352-2
_version_ 1783629959936344064
author Peng, Jacqueline
Zhao, Mengge
Havrilla, James
Liu, Cong
Weng, Chunhua
Guthrie, Whitney
Schultz, Robert
Wang, Kai
Zhou, Yunyun
author_facet Peng, Jacqueline
Zhao, Mengge
Havrilla, James
Liu, Cong
Weng, Chunhua
Guthrie, Whitney
Schultz, Robert
Wang, Kai
Zhou, Yunyun
author_sort Peng, Jacqueline
collection PubMed
description BACKGROUND: Natural language processing (NLP) tools can facilitate the extraction of biomedical concepts from unstructured free texts, such as research articles or clinical notes. The NLP software tools CLAMP, cTAKES, and MetaMap are among the most widely used tools to extract biomedical concept entities. However, their performance in extracting disease-specific terminology from literature has not been compared extensively, especially for complex neuropsychiatric disorders with a diverse set of phenotypic and clinical manifestations. METHODS: We comparatively evaluated these NLP tools using autism spectrum disorder (ASD) as a case study. We collected 827 ASD-related terms based on previous literature as the benchmark list for performance evaluation. Then, we applied CLAMP, cTAKES, and MetaMap on 544 full-text articles and 20,408 abstracts from PubMed to extract ASD-related terms. We evaluated the predictive performance using precision, recall, and F1 score. RESULTS: We found that CLAMP has the best performance in terms of F1 score followed by cTAKES and then MetaMap. Our results show that CLAMP has much higher precision than cTAKES and MetaMap, while cTAKES and MetaMap have higher recall than CLAMP. CONCLUSION: The analysis protocols used in this study can be applied to other neuropsychiatric or neurodevelopmental disorders that lack well-defined terminology sets to describe their phenotypic presentations.
format Online
Article
Text
id pubmed-7772897
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77728972020-12-30 Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder Peng, Jacqueline Zhao, Mengge Havrilla, James Liu, Cong Weng, Chunhua Guthrie, Whitney Schultz, Robert Wang, Kai Zhou, Yunyun BMC Med Inform Decis Mak Research BACKGROUND: Natural language processing (NLP) tools can facilitate the extraction of biomedical concepts from unstructured free texts, such as research articles or clinical notes. The NLP software tools CLAMP, cTAKES, and MetaMap are among the most widely used tools to extract biomedical concept entities. However, their performance in extracting disease-specific terminology from literature has not been compared extensively, especially for complex neuropsychiatric disorders with a diverse set of phenotypic and clinical manifestations. METHODS: We comparatively evaluated these NLP tools using autism spectrum disorder (ASD) as a case study. We collected 827 ASD-related terms based on previous literature as the benchmark list for performance evaluation. Then, we applied CLAMP, cTAKES, and MetaMap on 544 full-text articles and 20,408 abstracts from PubMed to extract ASD-related terms. We evaluated the predictive performance using precision, recall, and F1 score. RESULTS: We found that CLAMP has the best performance in terms of F1 score followed by cTAKES and then MetaMap. Our results show that CLAMP has much higher precision than cTAKES and MetaMap, while cTAKES and MetaMap have higher recall than CLAMP. CONCLUSION: The analysis protocols used in this study can be applied to other neuropsychiatric or neurodevelopmental disorders that lack well-defined terminology sets to describe their phenotypic presentations. BioMed Central 2020-12-30 /pmc/articles/PMC7772897/ /pubmed/33380331 http://dx.doi.org/10.1186/s12911-020-01352-2 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Peng, Jacqueline
Zhao, Mengge
Havrilla, James
Liu, Cong
Weng, Chunhua
Guthrie, Whitney
Schultz, Robert
Wang, Kai
Zhou, Yunyun
Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder
title Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder
title_full Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder
title_fullStr Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder
title_full_unstemmed Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder
title_short Natural language processing (NLP) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder
title_sort natural language processing (nlp) tools in extracting biomedical concepts from research articles: a case study on autism spectrum disorder
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7772897/
https://www.ncbi.nlm.nih.gov/pubmed/33380331
http://dx.doi.org/10.1186/s12911-020-01352-2
work_keys_str_mv AT pengjacqueline naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder
AT zhaomengge naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder
AT havrillajames naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder
AT liucong naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder
AT wengchunhua naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder
AT guthriewhitney naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder
AT schultzrobert naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder
AT wangkai naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder
AT zhouyunyun naturallanguageprocessingnlptoolsinextractingbiomedicalconceptsfromresearcharticlesacasestudyonautismspectrumdisorder