Cargando…

Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study

BACKGROUND: Researching people with herpes simplex virus (HSV) is challenging because of poor data quality, low user engagement, and concerns around stigma and anonymity. OBJECTIVE: This project aimed to improve data collection for a real-world HSV registry by identifying predictors of HSV infection...

Descripción completa

Detalles Bibliográficos
Autores principales:	Surodina, Svitlana, Lam, Ching, Grbich, Svetislav, Milne-Ives, Madison, van Velthoven, Michelle, Meinert, Edward
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2021
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10414389/ https://www.ncbi.nlm.nih.gov/pubmed/37725536 http://dx.doi.org/10.2196/25560

_version_	1785087327505219584
author	Surodina, Svitlana Lam, Ching Grbich, Svetislav Milne-Ives, Madison van Velthoven, Michelle Meinert, Edward
author_facet	Surodina, Svitlana Lam, Ching Grbich, Svetislav Milne-Ives, Madison van Velthoven, Michelle Meinert, Edward
author_sort	Surodina, Svitlana
collection	PubMed
description	BACKGROUND: Researching people with herpes simplex virus (HSV) is challenging because of poor data quality, low user engagement, and concerns around stigma and anonymity. OBJECTIVE: This project aimed to improve data collection for a real-world HSV registry by identifying predictors of HSV infection and selecting a limited number of relevant questions to ask new registry users to determine their level of HSV infection risk. METHODS: The US National Health and Nutrition Examination Survey (NHANES, 2015-2016) database includes the confirmed HSV type 1 and type 2 (HSV-1 and HSV-2, respectively) status of American participants (14-49 years) and a wealth of demographic and health-related data. The questionnaires and data sets from this survey were used to form two data sets: one for HSV-1 and one for HSV-2. These data sets were used to train and test a model that used a random forest algorithm (devised using Python) to minimize the number of anonymous lifestyle-based questions needed to identify risk groups for HSV. RESULTS: The model selected a reduced number of questions from the NHANES questionnaire that predicted HSV infection risk with high accuracy scores of 0.91 and 0.96 and high recall scores of 0.88 and 0.98 for the HSV-1 and HSV-2 data sets, respectively. The number of questions was reduced from 150 to an average of 40, depending on age and gender. The model, therefore, provided high predictability of risk of infection with minimal required input. CONCLUSIONS: This machine learning algorithm can be used in a real-world evidence registry to collect relevant lifestyle data and identify individuals’ levels of risk of HSV infection. A limitation is the absence of real user data and integration with electronic medical records, which would enable model learning and improvement. Future work will explore model adjustments, anonymization options, explicit permissions, and a standardized data schema that meet the General Data Protection Regulation, Health Insurance Portability and Accountability Act, and third-party interface connectivity requirements.
format	Online Article Text
id	pubmed-10414389
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-104143892023-09-12 Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study Surodina, Svitlana Lam, Ching Grbich, Svetislav Milne-Ives, Madison van Velthoven, Michelle Meinert, Edward JMIRx Med Original Paper BACKGROUND: Researching people with herpes simplex virus (HSV) is challenging because of poor data quality, low user engagement, and concerns around stigma and anonymity. OBJECTIVE: This project aimed to improve data collection for a real-world HSV registry by identifying predictors of HSV infection and selecting a limited number of relevant questions to ask new registry users to determine their level of HSV infection risk. METHODS: The US National Health and Nutrition Examination Survey (NHANES, 2015-2016) database includes the confirmed HSV type 1 and type 2 (HSV-1 and HSV-2, respectively) status of American participants (14-49 years) and a wealth of demographic and health-related data. The questionnaires and data sets from this survey were used to form two data sets: one for HSV-1 and one for HSV-2. These data sets were used to train and test a model that used a random forest algorithm (devised using Python) to minimize the number of anonymous lifestyle-based questions needed to identify risk groups for HSV. RESULTS: The model selected a reduced number of questions from the NHANES questionnaire that predicted HSV infection risk with high accuracy scores of 0.91 and 0.96 and high recall scores of 0.88 and 0.98 for the HSV-1 and HSV-2 data sets, respectively. The number of questions was reduced from 150 to an average of 40, depending on age and gender. The model, therefore, provided high predictability of risk of infection with minimal required input. CONCLUSIONS: This machine learning algorithm can be used in a real-world evidence registry to collect relevant lifestyle data and identify individuals’ levels of risk of HSV infection. A limitation is the absence of real user data and integration with electronic medical records, which would enable model learning and improvement. Future work will explore model adjustments, anonymization options, explicit permissions, and a standardized data schema that meet the General Data Protection Regulation, Health Insurance Portability and Accountability Act, and third-party interface connectivity requirements. JMIR Publications 2021-06-11 /pmc/articles/PMC10414389/ /pubmed/37725536 http://dx.doi.org/10.2196/25560 Text en ©Svitlana Surodina, Ching Lam, Svetislav Grbich, Madison Milne-Ives, Michelle van Velthoven, Edward Meinert. Originally published in JMIRx Med (https://med.jmirx.org), 11.06.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIRx Med, is properly cited. The complete bibliographic information, a link to the original publication on https://med.jmirx.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Surodina, Svitlana Lam, Ching Grbich, Svetislav Milne-Ives, Madison van Velthoven, Michelle Meinert, Edward Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study
title	Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study
title_full	Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study
title_fullStr	Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study
title_full_unstemmed	Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study
title_short	Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study
title_sort	machine learning for risk group identification and user data collection in a herpes simplex virus patient registry: algorithm development and validation study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10414389/ https://www.ncbi.nlm.nih.gov/pubmed/37725536 http://dx.doi.org/10.2196/25560
work_keys_str_mv	AT surodinasvitlana machinelearningforriskgroupidentificationanduserdatacollectioninaherpessimplexviruspatientregistryalgorithmdevelopmentandvalidationstudy AT lamching machinelearningforriskgroupidentificationanduserdatacollectioninaherpessimplexviruspatientregistryalgorithmdevelopmentandvalidationstudy AT grbichsvetislav machinelearningforriskgroupidentificationanduserdatacollectioninaherpessimplexviruspatientregistryalgorithmdevelopmentandvalidationstudy AT milneivesmadison machinelearningforriskgroupidentificationanduserdatacollectioninaherpessimplexviruspatientregistryalgorithmdevelopmentandvalidationstudy AT vanvelthovenmichelle machinelearningforriskgroupidentificationanduserdatacollectioninaherpessimplexviruspatientregistryalgorithmdevelopmentandvalidationstudy AT meinertedward machinelearningforriskgroupidentificationanduserdatacollectioninaherpessimplexviruspatientregistryalgorithmdevelopmentandvalidationstudy

Machine Learning for Risk Group Identification and User Data Collection in a Herpes Simplex Virus Patient Registry: Algorithm Development and Validation Study

Ejemplares similares