Cargando…

Improving Model Transferability for Clinical Note Section Classification Models Using Continued Pretraining

OBJECTIVE: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accur...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhou, Weipeng, Yetisgen, Meliha, Afshar, Majid, Gao, Yanjun, Savova, Guergana, Miller, Timothy A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cold Spring Harbor Laboratory 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168403/ https://www.ncbi.nlm.nih.gov/pubmed/37162963 http://dx.doi.org/10.1101/2023.04.15.23288628

_version_	1785038848145752064
author	Zhou, Weipeng Yetisgen, Meliha Afshar, Majid Gao, Yanjun Savova, Guergana Miller, Timothy A.
author_facet	Zhou, Weipeng Yetisgen, Meliha Afshar, Majid Gao, Yanjun Savova, Guergana Miller, Timothy A.
author_sort	Zhou, Weipeng
collection	PubMed
description	OBJECTIVE: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for one institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP (“Subjective”, “Object”, “Assessment” and “Plan”) framework with improved transferability. MATERIALS AND METHODS: We trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain adaptive pretraining (DAPT) and task adaptive pretraining (TAPT). We added out-of-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added. RESULTS: We found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across three datasets. This improvement was equivalent to adding 50.2 in-domain annotated samples. DISCUSSION: Although considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods. CONCLUSION: Continued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples.
format	Online Article Text
id	pubmed-10168403
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Cold Spring Harbor Laboratory
record_format	MEDLINE/PubMed
spelling	pubmed-101684032023-05-10 Improving Model Transferability for Clinical Note Section Classification Models Using Continued Pretraining Zhou, Weipeng Yetisgen, Meliha Afshar, Majid Gao, Yanjun Savova, Guergana Miller, Timothy A. medRxiv Article OBJECTIVE: The classification of clinical note sections is a critical step before doing more fine-grained natural language processing tasks such as social determinants of health extraction and temporal information extraction. Often, clinical note section classification models that achieve high accuracy for one institution experience a large drop of accuracy when transferred to another institution. The objective of this study is to develop methods that classify clinical note sections under the SOAP (“Subjective”, “Object”, “Assessment” and “Plan”) framework with improved transferability. MATERIALS AND METHODS: We trained the baseline models by fine-tuning BERT-based models, and enhanced their transferability with continued pretraining, including domain adaptive pretraining (DAPT) and task adaptive pretraining (TAPT). We added out-of-domain annotated samples during fine-tuning and observed model performance over a varying number of annotated sample size. Finally, we quantified the impact of continued pretraining in equivalence of the number of in-domain annotated samples added. RESULTS: We found continued pretraining improved models only when combined with in-domain annotated samples, improving the F1 score from 0.756 to 0.808, averaged across three datasets. This improvement was equivalent to adding 50.2 in-domain annotated samples. DISCUSSION: Although considered a straightforward task when performing in-domain, section classification is still a considerably difficult task when performing cross-domain, even using highly sophisticated neural network-based methods. CONCLUSION: Continued pretraining improved model transferability for cross-domain clinical note section classification in the presence of a small amount of in-domain labeled samples. Cold Spring Harbor Laboratory 2023-04-24 /pmc/articles/PMC10168403/ /pubmed/37162963 http://dx.doi.org/10.1101/2023.04.15.23288628 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle	Article Zhou, Weipeng Yetisgen, Meliha Afshar, Majid Gao, Yanjun Savova, Guergana Miller, Timothy A. Improving Model Transferability for Clinical Note Section Classification Models Using Continued Pretraining
title	Improving Model Transferability for Clinical Note Section Classification Models Using Continued Pretraining
title_full	Improving Model Transferability for Clinical Note Section Classification Models Using Continued Pretraining
title_fullStr	Improving Model Transferability for Clinical Note Section Classification Models Using Continued Pretraining
title_full_unstemmed	Improving Model Transferability for Clinical Note Section Classification Models Using Continued Pretraining
title_short	Improving Model Transferability for Clinical Note Section Classification Models Using Continued Pretraining
title_sort	improving model transferability for clinical note section classification models using continued pretraining
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10168403/ https://www.ncbi.nlm.nih.gov/pubmed/37162963 http://dx.doi.org/10.1101/2023.04.15.23288628
work_keys_str_mv	AT zhouweipeng improvingmodeltransferabilityforclinicalnotesectionclassificationmodelsusingcontinuedpretraining AT yetisgenmeliha improvingmodeltransferabilityforclinicalnotesectionclassificationmodelsusingcontinuedpretraining AT afsharmajid improvingmodeltransferabilityforclinicalnotesectionclassificationmodelsusingcontinuedpretraining AT gaoyanjun improvingmodeltransferabilityforclinicalnotesectionclassificationmodelsusingcontinuedpretraining AT savovaguergana improvingmodeltransferabilityforclinicalnotesectionclassificationmodelsusingcontinuedpretraining AT millertimothya improvingmodeltransferabilityforclinicalnotesectionclassificationmodelsusingcontinuedpretraining

Improving Model Transferability for Clinical Note Section Classification Models Using Continued Pretraining

Ejemplares similares