Cargando…

Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches

Collecting valid information from electronic sources to detect the potential outbreak of infectious disease is time-consuming and labor-intensive. The automated identification of relevant information using machine learning is necessary to respond to a potential disease outbreak. A total of 2864 docu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Mira, Chae, Kyunghee, Lee, Seungwoo, Jang, Hong-Jun, Kim, Sukil
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7766498/ https://www.ncbi.nlm.nih.gov/pubmed/33348764 http://dx.doi.org/10.3390/ijerph17249467

_version_	1783628733157998592
author	Kim, Mira Chae, Kyunghee Lee, Seungwoo Jang, Hong-Jun Kim, Sukil
author_facet	Kim, Mira Chae, Kyunghee Lee, Seungwoo Jang, Hong-Jun Kim, Sukil
author_sort	Kim, Mira
collection	PubMed
description	Collecting valid information from electronic sources to detect the potential outbreak of infectious disease is time-consuming and labor-intensive. The automated identification of relevant information using machine learning is necessary to respond to a potential disease outbreak. A total of 2864 documents were collected from various websites and subsequently manually categorized and labeled by two reviewers. Accurate labels for the training and test data were provided based on a reviewer consensus. Two machine learning algorithms—ConvNet and bidirectional long short-term memory (BiLSTM)—and two classification methods—DocClass and SenClass—were used for classifying the documents. The precision, recall, F1, accuracy, and area under the curve were measured to evaluate the performance of each model. ConvNet yielded higher average, min, and max accuracies (87.6%, 85.2%, and 91.1%, respectively) than BiLSTM with DocClass, while BiLSTM performed better than ConvNet with SenClass with average, min, and max accuracies of 92.8%, 92.6%, and 93.3%, respectively. The performance of BiLSTM with SenClass yielded an overall accuracy of 92.9% in classifying infectious disease occurrences. Machine learning had a compatible performance with a human expert given a particular text extraction system. This study suggests that analyzing information from the website using machine learning can achieve significant accuracies in the presence of abundant articles/documents.
format	Online Article Text
id	pubmed-7766498
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-77664982020-12-28 Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches Kim, Mira Chae, Kyunghee Lee, Seungwoo Jang, Hong-Jun Kim, Sukil Int J Environ Res Public Health Article Collecting valid information from electronic sources to detect the potential outbreak of infectious disease is time-consuming and labor-intensive. The automated identification of relevant information using machine learning is necessary to respond to a potential disease outbreak. A total of 2864 documents were collected from various websites and subsequently manually categorized and labeled by two reviewers. Accurate labels for the training and test data were provided based on a reviewer consensus. Two machine learning algorithms—ConvNet and bidirectional long short-term memory (BiLSTM)—and two classification methods—DocClass and SenClass—were used for classifying the documents. The precision, recall, F1, accuracy, and area under the curve were measured to evaluate the performance of each model. ConvNet yielded higher average, min, and max accuracies (87.6%, 85.2%, and 91.1%, respectively) than BiLSTM with DocClass, while BiLSTM performed better than ConvNet with SenClass with average, min, and max accuracies of 92.8%, 92.6%, and 93.3%, respectively. The performance of BiLSTM with SenClass yielded an overall accuracy of 92.9% in classifying infectious disease occurrences. Machine learning had a compatible performance with a human expert given a particular text extraction system. This study suggests that analyzing information from the website using machine learning can achieve significant accuracies in the presence of abundant articles/documents. MDPI 2020-12-17 2020-12 /pmc/articles/PMC7766498/ /pubmed/33348764 http://dx.doi.org/10.3390/ijerph17249467 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kim, Mira Chae, Kyunghee Lee, Seungwoo Jang, Hong-Jun Kim, Sukil Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches
title	Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches
title_full	Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches
title_fullStr	Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches
title_full_unstemmed	Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches
title_short	Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches
title_sort	automated classification of online sources for infectious disease occurrences using machine-learning-based natural language processing approaches
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7766498/ https://www.ncbi.nlm.nih.gov/pubmed/33348764 http://dx.doi.org/10.3390/ijerph17249467
work_keys_str_mv	AT kimmira automatedclassificationofonlinesourcesforinfectiousdiseaseoccurrencesusingmachinelearningbasednaturallanguageprocessingapproaches AT chaekyunghee automatedclassificationofonlinesourcesforinfectiousdiseaseoccurrencesusingmachinelearningbasednaturallanguageprocessingapproaches AT leeseungwoo automatedclassificationofonlinesourcesforinfectiousdiseaseoccurrencesusingmachinelearningbasednaturallanguageprocessingapproaches AT janghongjun automatedclassificationofonlinesourcesforinfectiousdiseaseoccurrencesusingmachinelearningbasednaturallanguageprocessingapproaches AT kimsukil automatedclassificationofonlinesourcesforinfectiousdiseaseoccurrencesusingmachinelearningbasednaturallanguageprocessingapproaches

Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches

Ejemplares similares