Cargando…

Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study

BACKGROUND: Health care organizations are collecting increasing volumes of clinical text data. Topic models are a class of unsupervised machine learning algorithms for discovering latent thematic patterns in these large unstructured document collections. OBJECTIVE: We aimed to comparatively evaluate...

Descripción completa

Detalles Bibliográficos
Autores principales:	Meaney, Christopher, Escobar, Michael, Stukel, Therese A, Austin, Peter C, Jaakkimainen, Liisa
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2022
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808604/ https://www.ncbi.nlm.nih.gov/pubmed/36534443 http://dx.doi.org/10.2196/40102

_version_	1784862968962351104
author	Meaney, Christopher Escobar, Michael Stukel, Therese A Austin, Peter C Jaakkimainen, Liisa
author_facet	Meaney, Christopher Escobar, Michael Stukel, Therese A Austin, Peter C Jaakkimainen, Liisa
author_sort	Meaney, Christopher
collection	PubMed
description	BACKGROUND: Health care organizations are collecting increasing volumes of clinical text data. Topic models are a class of unsupervised machine learning algorithms for discovering latent thematic patterns in these large unstructured document collections. OBJECTIVE: We aimed to comparatively evaluate several methods for estimating temporal topic models using clinical notes obtained from primary care electronic medical records from Ontario, Canada. METHODS: We used a retrospective closed cohort design. The study spanned from January 01, 2011, through December 31, 2015, discretized into 20 quarterly periods. Patients were included in the study if they generated at least 1 primary care clinical note in each of the 20 quarterly periods. These patients represented a unique cohort of individuals engaging in high-frequency use of the primary care system. The following temporal topic modeling algorithms were fitted to the clinical note corpus: nonnegative matrix factorization, latent Dirichlet allocation, the structural topic model, and the BERTopic model. RESULTS: Temporal topic models consistently identified latent topical patterns in the clinical note corpus. The learned topical bases identified meaningful activities conducted by the primary health care system. Latent topics displaying near-constant temporal dynamics were consistently estimated across models (eg, pain, hypertension, diabetes, sleep, mood, anxiety, and depression). Several topics displayed predictable seasonal patterns over the study period (eg, respiratory disease and influenza immunization programs). CONCLUSIONS: Nonnegative matrix factorization, latent Dirichlet allocation, structural topic model, and BERTopic are based on different underlying statistical frameworks (eg, linear algebra and optimization, Bayesian graphical models, and neural embeddings), require tuning unique hyperparameters (optimizers, priors, etc), and have distinct computational requirements (data structures, computational hardware, etc). Despite the heterogeneity in statistical methodology, the learned latent topical summarizations and their temporal evolution over the study period were consistently estimated. Temporal topic models represent an interesting class of models for characterizing and monitoring the primary health care system.
format	Online Article Text
id	pubmed-9808604
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-98086042023-01-04 Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study Meaney, Christopher Escobar, Michael Stukel, Therese A Austin, Peter C Jaakkimainen, Liisa JMIR Med Inform Original Paper BACKGROUND: Health care organizations are collecting increasing volumes of clinical text data. Topic models are a class of unsupervised machine learning algorithms for discovering latent thematic patterns in these large unstructured document collections. OBJECTIVE: We aimed to comparatively evaluate several methods for estimating temporal topic models using clinical notes obtained from primary care electronic medical records from Ontario, Canada. METHODS: We used a retrospective closed cohort design. The study spanned from January 01, 2011, through December 31, 2015, discretized into 20 quarterly periods. Patients were included in the study if they generated at least 1 primary care clinical note in each of the 20 quarterly periods. These patients represented a unique cohort of individuals engaging in high-frequency use of the primary care system. The following temporal topic modeling algorithms were fitted to the clinical note corpus: nonnegative matrix factorization, latent Dirichlet allocation, the structural topic model, and the BERTopic model. RESULTS: Temporal topic models consistently identified latent topical patterns in the clinical note corpus. The learned topical bases identified meaningful activities conducted by the primary health care system. Latent topics displaying near-constant temporal dynamics were consistently estimated across models (eg, pain, hypertension, diabetes, sleep, mood, anxiety, and depression). Several topics displayed predictable seasonal patterns over the study period (eg, respiratory disease and influenza immunization programs). CONCLUSIONS: Nonnegative matrix factorization, latent Dirichlet allocation, structural topic model, and BERTopic are based on different underlying statistical frameworks (eg, linear algebra and optimization, Bayesian graphical models, and neural embeddings), require tuning unique hyperparameters (optimizers, priors, etc), and have distinct computational requirements (data structures, computational hardware, etc). Despite the heterogeneity in statistical methodology, the learned latent topical summarizations and their temporal evolution over the study period were consistently estimated. Temporal topic models represent an interesting class of models for characterizing and monitoring the primary health care system. JMIR Publications 2022-12-19 /pmc/articles/PMC9808604/ /pubmed/36534443 http://dx.doi.org/10.2196/40102 Text en ©Christopher Meaney, Michael Escobar, Therese A Stukel, Peter C Austin, Liisa Jaakkimainen. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 19.12.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Meaney, Christopher Escobar, Michael Stukel, Therese A Austin, Peter C Jaakkimainen, Liisa Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study
title	Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study
title_full	Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study
title_fullStr	Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study
title_full_unstemmed	Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study
title_short	Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study
title_sort	comparison of methods for estimating temporal topic models from primary care clinical text data: retrospective closed cohort study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9808604/ https://www.ncbi.nlm.nih.gov/pubmed/36534443 http://dx.doi.org/10.2196/40102
work_keys_str_mv	AT meaneychristopher comparisonofmethodsforestimatingtemporaltopicmodelsfromprimarycareclinicaltextdataretrospectiveclosedcohortstudy AT escobarmichael comparisonofmethodsforestimatingtemporaltopicmodelsfromprimarycareclinicaltextdataretrospectiveclosedcohortstudy AT stukeltheresea comparisonofmethodsforestimatingtemporaltopicmodelsfromprimarycareclinicaltextdataretrospectiveclosedcohortstudy AT austinpeterc comparisonofmethodsforestimatingtemporaltopicmodelsfromprimarycareclinicaltextdataretrospectiveclosedcohortstudy AT jaakkimainenliisa comparisonofmethodsforestimatingtemporaltopicmodelsfromprimarycareclinicaltextdataretrospectiveclosedcohortstudy

Comparison of Methods for Estimating Temporal Topic Models From Primary Care Clinical Text Data: Retrospective Closed Cohort Study

Ejemplares similares