Cargando…

Epidemiologic information discovery from open-access COVID-19 case reports via pretrained language model

Although open-access data are increasingly common and useful to epidemiological research, the curation of such datasets is resource-intensive and time-consuming. Despite the existence of a major source of COVID-19 data, the regularly disclosed case reports were often written in natural language with...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Zhizheng, Liu, Xiao Fan, Du, Zhanwei, Wang, Lin, Wu, Ye, Holme, Petter, Lachmann, Michael, Lin, Hongfei, Wong, Zoie S.Y., Xu, Xiao-Ke, Sun, Yuanyuan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9441477/ https://www.ncbi.nlm.nih.gov/pubmed/36093379 http://dx.doi.org/10.1016/j.isci.2022.105079

Descripción
Sumario:	Although open-access data are increasingly common and useful to epidemiological research, the curation of such datasets is resource-intensive and time-consuming. Despite the existence of a major source of COVID-19 data, the regularly disclosed case reports were often written in natural language with an unstructured format. Here, we propose a computational framework that can automatically extract epidemiological information from open-access COVID-19 case reports. We develop this framework by coupling a language model developed using deep neural networks with training samples compiled using an optimized data annotation strategy. When applied to the COVID-19 case reports collected from mainland China, our framework outperforms all other state-of-the-art deep learning models. The information extracted from our approach is highly consistent with that obtained from the gold-standard manual coding, with a matching rate of 80%. To disseminate our algorithm, we provide an open-access online platform that is able to estimate key epidemiological statistics in real time, with much less effort for data curation.

Epidemiologic information discovery from open-access COVID-19 case reports via pretrained language model

Ejemplares similares