Cargando…
VetTag: improving automated veterinary diagnosis coding via large-scale language modeling
Unlike human medical records, most of the veterinary records are free text without standard diagnosis coding. The lack of systematic coding is a major barrier to the growing interest in leveraging veterinary records for public health and translational research. Recent machine learning effort is limi...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6550141/ https://www.ncbi.nlm.nih.gov/pubmed/31304381 http://dx.doi.org/10.1038/s41746-019-0113-1 |
_version_ | 1783424136368881664 |
---|---|
author | Zhang, Yuhui Nie, Allen Zehnder, Ashley Page, Rodney L. Zou, James |
author_facet | Zhang, Yuhui Nie, Allen Zehnder, Ashley Page, Rodney L. Zou, James |
author_sort | Zhang, Yuhui |
collection | PubMed |
description | Unlike human medical records, most of the veterinary records are free text without standard diagnosis coding. The lack of systematic coding is a major barrier to the growing interest in leveraging veterinary records for public health and translational research. Recent machine learning effort is limited to predicting 42 top-level diagnosis categories from veterinary notes. Here we develop a large-scale algorithm to automatically predict all 4577 standard veterinary diagnosis codes from free text. We train our algorithm on a curated dataset of over 100 K expert labeled veterinary notes and over one million unlabeled notes. Our algorithm is based on the adapted Transformer architecture and we demonstrate that large-scale language modeling on the unlabeled notes via pretraining and as an auxiliary objective during supervised learning greatly improves performance. We systematically evaluate the performance of the model and several baselines in challenging settings where algorithms trained on one hospital are evaluated in a different hospital with substantial domain shift. In addition, we show that hierarchical training can address severe data imbalances for fine-grained diagnosis with a few training cases, and we provide interpretation for what is learned by the deep network. Our algorithm addresses an important challenge in veterinary medicine, and our model and experiments add insights into the power of unsupervised learning for clinical natural language processing. |
format | Online Article Text |
id | pubmed-6550141 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-65501412019-07-12 VetTag: improving automated veterinary diagnosis coding via large-scale language modeling Zhang, Yuhui Nie, Allen Zehnder, Ashley Page, Rodney L. Zou, James NPJ Digit Med Article Unlike human medical records, most of the veterinary records are free text without standard diagnosis coding. The lack of systematic coding is a major barrier to the growing interest in leveraging veterinary records for public health and translational research. Recent machine learning effort is limited to predicting 42 top-level diagnosis categories from veterinary notes. Here we develop a large-scale algorithm to automatically predict all 4577 standard veterinary diagnosis codes from free text. We train our algorithm on a curated dataset of over 100 K expert labeled veterinary notes and over one million unlabeled notes. Our algorithm is based on the adapted Transformer architecture and we demonstrate that large-scale language modeling on the unlabeled notes via pretraining and as an auxiliary objective during supervised learning greatly improves performance. We systematically evaluate the performance of the model and several baselines in challenging settings where algorithms trained on one hospital are evaluated in a different hospital with substantial domain shift. In addition, we show that hierarchical training can address severe data imbalances for fine-grained diagnosis with a few training cases, and we provide interpretation for what is learned by the deep network. Our algorithm addresses an important challenge in veterinary medicine, and our model and experiments add insights into the power of unsupervised learning for clinical natural language processing. Nature Publishing Group UK 2019-05-08 /pmc/articles/PMC6550141/ /pubmed/31304381 http://dx.doi.org/10.1038/s41746-019-0113-1 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Zhang, Yuhui Nie, Allen Zehnder, Ashley Page, Rodney L. Zou, James VetTag: improving automated veterinary diagnosis coding via large-scale language modeling |
title | VetTag: improving automated veterinary diagnosis coding via large-scale language modeling |
title_full | VetTag: improving automated veterinary diagnosis coding via large-scale language modeling |
title_fullStr | VetTag: improving automated veterinary diagnosis coding via large-scale language modeling |
title_full_unstemmed | VetTag: improving automated veterinary diagnosis coding via large-scale language modeling |
title_short | VetTag: improving automated veterinary diagnosis coding via large-scale language modeling |
title_sort | vettag: improving automated veterinary diagnosis coding via large-scale language modeling |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6550141/ https://www.ncbi.nlm.nih.gov/pubmed/31304381 http://dx.doi.org/10.1038/s41746-019-0113-1 |
work_keys_str_mv | AT zhangyuhui vettagimprovingautomatedveterinarydiagnosiscodingvialargescalelanguagemodeling AT nieallen vettagimprovingautomatedveterinarydiagnosiscodingvialargescalelanguagemodeling AT zehnderashley vettagimprovingautomatedveterinarydiagnosiscodingvialargescalelanguagemodeling AT pagerodneyl vettagimprovingautomatedveterinarydiagnosiscodingvialargescalelanguagemodeling AT zoujames vettagimprovingautomatedveterinarydiagnosiscodingvialargescalelanguagemodeling |