Cargando…

VetTag: improving automated veterinary diagnosis coding via large-scale language modeling

Unlike human medical records, most of the veterinary records are free text without standard diagnosis coding. The lack of systematic coding is a major barrier to the growing interest in leveraging veterinary records for public health and translational research. Recent machine learning effort is limi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Yuhui, Nie, Allen, Zehnder, Ashley, Page, Rodney L., Zou, James
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6550141/ https://www.ncbi.nlm.nih.gov/pubmed/31304381 http://dx.doi.org/10.1038/s41746-019-0113-1

_version_	1783424136368881664
author	Zhang, Yuhui Nie, Allen Zehnder, Ashley Page, Rodney L. Zou, James
author_facet	Zhang, Yuhui Nie, Allen Zehnder, Ashley Page, Rodney L. Zou, James
author_sort	Zhang, Yuhui
collection	PubMed
description	Unlike human medical records, most of the veterinary records are free text without standard diagnosis coding. The lack of systematic coding is a major barrier to the growing interest in leveraging veterinary records for public health and translational research. Recent machine learning effort is limited to predicting 42 top-level diagnosis categories from veterinary notes. Here we develop a large-scale algorithm to automatically predict all 4577 standard veterinary diagnosis codes from free text. We train our algorithm on a curated dataset of over 100 K expert labeled veterinary notes and over one million unlabeled notes. Our algorithm is based on the adapted Transformer architecture and we demonstrate that large-scale language modeling on the unlabeled notes via pretraining and as an auxiliary objective during supervised learning greatly improves performance. We systematically evaluate the performance of the model and several baselines in challenging settings where algorithms trained on one hospital are evaluated in a different hospital with substantial domain shift. In addition, we show that hierarchical training can address severe data imbalances for fine-grained diagnosis with a few training cases, and we provide interpretation for what is learned by the deep network. Our algorithm addresses an important challenge in veterinary medicine, and our model and experiments add insights into the power of unsupervised learning for clinical natural language processing.
format	Online Article Text
id	pubmed-6550141
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-65501412019-07-12 VetTag: improving automated veterinary diagnosis coding via large-scale language modeling Zhang, Yuhui Nie, Allen Zehnder, Ashley Page, Rodney L. Zou, James NPJ Digit Med Article Unlike human medical records, most of the veterinary records are free text without standard diagnosis coding. The lack of systematic coding is a major barrier to the growing interest in leveraging veterinary records for public health and translational research. Recent machine learning effort is limited to predicting 42 top-level diagnosis categories from veterinary notes. Here we develop a large-scale algorithm to automatically predict all 4577 standard veterinary diagnosis codes from free text. We train our algorithm on a curated dataset of over 100 K expert labeled veterinary notes and over one million unlabeled notes. Our algorithm is based on the adapted Transformer architecture and we demonstrate that large-scale language modeling on the unlabeled notes via pretraining and as an auxiliary objective during supervised learning greatly improves performance. We systematically evaluate the performance of the model and several baselines in challenging settings where algorithms trained on one hospital are evaluated in a different hospital with substantial domain shift. In addition, we show that hierarchical training can address severe data imbalances for fine-grained diagnosis with a few training cases, and we provide interpretation for what is learned by the deep network. Our algorithm addresses an important challenge in veterinary medicine, and our model and experiments add insights into the power of unsupervised learning for clinical natural language processing. Nature Publishing Group UK 2019-05-08 /pmc/articles/PMC6550141/ /pubmed/31304381 http://dx.doi.org/10.1038/s41746-019-0113-1 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Article Zhang, Yuhui Nie, Allen Zehnder, Ashley Page, Rodney L. Zou, James VetTag: improving automated veterinary diagnosis coding via large-scale language modeling
title	VetTag: improving automated veterinary diagnosis coding via large-scale language modeling
title_full	VetTag: improving automated veterinary diagnosis coding via large-scale language modeling
title_fullStr	VetTag: improving automated veterinary diagnosis coding via large-scale language modeling
title_full_unstemmed	VetTag: improving automated veterinary diagnosis coding via large-scale language modeling
title_short	VetTag: improving automated veterinary diagnosis coding via large-scale language modeling
title_sort	vettag: improving automated veterinary diagnosis coding via large-scale language modeling
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6550141/ https://www.ncbi.nlm.nih.gov/pubmed/31304381 http://dx.doi.org/10.1038/s41746-019-0113-1
work_keys_str_mv	AT zhangyuhui vettagimprovingautomatedveterinarydiagnosiscodingvialargescalelanguagemodeling AT nieallen vettagimprovingautomatedveterinarydiagnosiscodingvialargescalelanguagemodeling AT zehnderashley vettagimprovingautomatedveterinarydiagnosiscodingvialargescalelanguagemodeling AT pagerodneyl vettagimprovingautomatedveterinarydiagnosiscodingvialargescalelanguagemodeling AT zoujames vettagimprovingautomatedveterinarydiagnosiscodingvialargescalelanguagemodeling

VetTag: improving automated veterinary diagnosis coding via large-scale language modeling

Ejemplares similares