Cargando…

VetTag: improving automated veterinary diagnosis coding via large-scale language modeling

Unlike human medical records, most of the veterinary records are free text without standard diagnosis coding. The lack of systematic coding is a major barrier to the growing interest in leveraging veterinary records for public health and translational research. Recent machine learning effort is limi...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yuhui, Nie, Allen, Zehnder, Ashley, Page, Rodney L., Zou, James
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6550141/
https://www.ncbi.nlm.nih.gov/pubmed/31304381
http://dx.doi.org/10.1038/s41746-019-0113-1
_version_ 1783424136368881664
author Zhang, Yuhui
Nie, Allen
Zehnder, Ashley
Page, Rodney L.
Zou, James
author_facet Zhang, Yuhui
Nie, Allen
Zehnder, Ashley
Page, Rodney L.
Zou, James
author_sort Zhang, Yuhui
collection PubMed
description Unlike human medical records, most of the veterinary records are free text without standard diagnosis coding. The lack of systematic coding is a major barrier to the growing interest in leveraging veterinary records for public health and translational research. Recent machine learning effort is limited to predicting 42 top-level diagnosis categories from veterinary notes. Here we develop a large-scale algorithm to automatically predict all 4577 standard veterinary diagnosis codes from free text. We train our algorithm on a curated dataset of over 100 K expert labeled veterinary notes and over one million unlabeled notes. Our algorithm is based on the adapted Transformer architecture and we demonstrate that large-scale language modeling on the unlabeled notes via pretraining and as an auxiliary objective during supervised learning greatly improves performance. We systematically evaluate the performance of the model and several baselines in challenging settings where algorithms trained on one hospital are evaluated in a different hospital with substantial domain shift. In addition, we show that hierarchical training can address severe data imbalances for fine-grained diagnosis with a few training cases, and we provide interpretation for what is learned by the deep network. Our algorithm addresses an important challenge in veterinary medicine, and our model and experiments add insights into the power of unsupervised learning for clinical natural language processing.
format Online
Article
Text
id pubmed-6550141
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-65501412019-07-12 VetTag: improving automated veterinary diagnosis coding via large-scale language modeling Zhang, Yuhui Nie, Allen Zehnder, Ashley Page, Rodney L. Zou, James NPJ Digit Med Article Unlike human medical records, most of the veterinary records are free text without standard diagnosis coding. The lack of systematic coding is a major barrier to the growing interest in leveraging veterinary records for public health and translational research. Recent machine learning effort is limited to predicting 42 top-level diagnosis categories from veterinary notes. Here we develop a large-scale algorithm to automatically predict all 4577 standard veterinary diagnosis codes from free text. We train our algorithm on a curated dataset of over 100 K expert labeled veterinary notes and over one million unlabeled notes. Our algorithm is based on the adapted Transformer architecture and we demonstrate that large-scale language modeling on the unlabeled notes via pretraining and as an auxiliary objective during supervised learning greatly improves performance. We systematically evaluate the performance of the model and several baselines in challenging settings where algorithms trained on one hospital are evaluated in a different hospital with substantial domain shift. In addition, we show that hierarchical training can address severe data imbalances for fine-grained diagnosis with a few training cases, and we provide interpretation for what is learned by the deep network. Our algorithm addresses an important challenge in veterinary medicine, and our model and experiments add insights into the power of unsupervised learning for clinical natural language processing. Nature Publishing Group UK 2019-05-08 /pmc/articles/PMC6550141/ /pubmed/31304381 http://dx.doi.org/10.1038/s41746-019-0113-1 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Zhang, Yuhui
Nie, Allen
Zehnder, Ashley
Page, Rodney L.
Zou, James
VetTag: improving automated veterinary diagnosis coding via large-scale language modeling
title VetTag: improving automated veterinary diagnosis coding via large-scale language modeling
title_full VetTag: improving automated veterinary diagnosis coding via large-scale language modeling
title_fullStr VetTag: improving automated veterinary diagnosis coding via large-scale language modeling
title_full_unstemmed VetTag: improving automated veterinary diagnosis coding via large-scale language modeling
title_short VetTag: improving automated veterinary diagnosis coding via large-scale language modeling
title_sort vettag: improving automated veterinary diagnosis coding via large-scale language modeling
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6550141/
https://www.ncbi.nlm.nih.gov/pubmed/31304381
http://dx.doi.org/10.1038/s41746-019-0113-1
work_keys_str_mv AT zhangyuhui vettagimprovingautomatedveterinarydiagnosiscodingvialargescalelanguagemodeling
AT nieallen vettagimprovingautomatedveterinarydiagnosiscodingvialargescalelanguagemodeling
AT zehnderashley vettagimprovingautomatedveterinarydiagnosiscodingvialargescalelanguagemodeling
AT pagerodneyl vettagimprovingautomatedveterinarydiagnosiscodingvialargescalelanguagemodeling
AT zoujames vettagimprovingautomatedveterinarydiagnosiscodingvialargescalelanguagemodeling