Cargando…

Biomedical and clinical English model packages for the Stanza Python NLP library

OBJECTIVE: The study sought to develop and evaluate neural natural language processing (NLP) packages for the syntactic analysis and named entity recognition of biomedical and clinical English text. MATERIALS AND METHODS: We implement and train biomedical and clinical English NLP pipelines by extend...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Yuhao, Zhang, Yuhui, Qi, Peng, Manning, Christopher D, Langlotz, Curtis P
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Research and Applications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8363782/ https://www.ncbi.nlm.nih.gov/pubmed/34157094 http://dx.doi.org/10.1093/jamia/ocab090

_version_	1783738410889904128
author	Zhang, Yuhao Zhang, Yuhui Qi, Peng Manning, Christopher D Langlotz, Curtis P
author_facet	Zhang, Yuhao Zhang, Yuhui Qi, Peng Manning, Christopher D Langlotz, Curtis P
author_sort	Zhang, Yuhao
collection	PubMed
description	OBJECTIVE: The study sought to develop and evaluate neural natural language processing (NLP) packages for the syntactic analysis and named entity recognition of biomedical and clinical English text. MATERIALS AND METHODS: We implement and train biomedical and clinical English NLP pipelines by extending the widely used Stanza library originally designed for general NLP tasks. Our models are trained with a mix of public datasets such as the CRAFT treebank as well as with a private corpus of radiology reports annotated with 5 radiology-domain entities. The resulting pipelines are fully based on neural networks, and are able to perform tokenization, part-of-speech tagging, lemmatization, dependency parsing, and named entity recognition for both biomedical and clinical text. We compare our systems against popular open-source NLP libraries such as CoreNLP and scispaCy, state-of-the-art models such as the BioBERT models, and winning systems from the BioNLP CRAFT shared task. RESULTS: For syntactic analysis, our systems achieve much better performance compared with the released scispaCy models and CoreNLP models retrained on the same treebanks, and are on par with the winning system from the CRAFT shared task. For NER, our systems substantially outperform scispaCy, and are better or on par with the state-of-the-art performance from BioBERT, while being much more computationally efficient. CONCLUSIONS: We introduce biomedical and clinical NLP packages built for the Stanza library. These packages offer performance that is similar to the state of the art, and are also optimized for ease of use. To facilitate research, we make all our models publicly available. We also provide an online demonstration (http://stanza.run/bio).
format	Online Article Text
id	pubmed-8363782
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-83637822021-08-17 Biomedical and clinical English model packages for the Stanza Python NLP library Zhang, Yuhao Zhang, Yuhui Qi, Peng Manning, Christopher D Langlotz, Curtis P J Am Med Inform Assoc Research and Applications OBJECTIVE: The study sought to develop and evaluate neural natural language processing (NLP) packages for the syntactic analysis and named entity recognition of biomedical and clinical English text. MATERIALS AND METHODS: We implement and train biomedical and clinical English NLP pipelines by extending the widely used Stanza library originally designed for general NLP tasks. Our models are trained with a mix of public datasets such as the CRAFT treebank as well as with a private corpus of radiology reports annotated with 5 radiology-domain entities. The resulting pipelines are fully based on neural networks, and are able to perform tokenization, part-of-speech tagging, lemmatization, dependency parsing, and named entity recognition for both biomedical and clinical text. We compare our systems against popular open-source NLP libraries such as CoreNLP and scispaCy, state-of-the-art models such as the BioBERT models, and winning systems from the BioNLP CRAFT shared task. RESULTS: For syntactic analysis, our systems achieve much better performance compared with the released scispaCy models and CoreNLP models retrained on the same treebanks, and are on par with the winning system from the CRAFT shared task. For NER, our systems substantially outperform scispaCy, and are better or on par with the state-of-the-art performance from BioBERT, while being much more computationally efficient. CONCLUSIONS: We introduce biomedical and clinical NLP packages built for the Stanza library. These packages offer performance that is similar to the state of the art, and are also optimized for ease of use. To facilitate research, we make all our models publicly available. We also provide an online demonstration (http://stanza.run/bio). Oxford University Press 2021-06-22 /pmc/articles/PMC8363782/ /pubmed/34157094 http://dx.doi.org/10.1093/jamia/ocab090 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Research and Applications Zhang, Yuhao Zhang, Yuhui Qi, Peng Manning, Christopher D Langlotz, Curtis P Biomedical and clinical English model packages for the Stanza Python NLP library
title	Biomedical and clinical English model packages for the Stanza Python NLP library
title_full	Biomedical and clinical English model packages for the Stanza Python NLP library
title_fullStr	Biomedical and clinical English model packages for the Stanza Python NLP library
title_full_unstemmed	Biomedical and clinical English model packages for the Stanza Python NLP library
title_short	Biomedical and clinical English model packages for the Stanza Python NLP library
title_sort	biomedical and clinical english model packages for the stanza python nlp library
topic	Research and Applications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8363782/ https://www.ncbi.nlm.nih.gov/pubmed/34157094 http://dx.doi.org/10.1093/jamia/ocab090
work_keys_str_mv	AT zhangyuhao biomedicalandclinicalenglishmodelpackagesforthestanzapythonnlplibrary AT zhangyuhui biomedicalandclinicalenglishmodelpackagesforthestanzapythonnlplibrary AT qipeng biomedicalandclinicalenglishmodelpackagesforthestanzapythonnlplibrary AT manningchristopherd biomedicalandclinicalenglishmodelpackagesforthestanzapythonnlplibrary AT langlotzcurtisp biomedicalandclinicalenglishmodelpackagesforthestanzapythonnlplibrary

Biomedical and clinical English model packages for the Stanza Python NLP library

Ejemplares similares