Cargando…
A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study
BACKGROUND: Web-based crowdfunding has become a popular method to raise money for medical expenses, and there is growing research interest in this topic. However, crowdfunding data are largely composed of unstructured text, thereby posing many challenges for researchers hoping to answer questions ab...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9257615/ https://www.ncbi.nlm.nih.gov/pubmed/35727610 http://dx.doi.org/10.2196/32867 |
_version_ | 1784741373874798592 |
---|---|
author | Doerstling, Steven S Akrobetu, Dennis Engelhard, Matthew M Chen, Felicia Ubel, Peter A |
author_facet | Doerstling, Steven S Akrobetu, Dennis Engelhard, Matthew M Chen, Felicia Ubel, Peter A |
author_sort | Doerstling, Steven S |
collection | PubMed |
description | BACKGROUND: Web-based crowdfunding has become a popular method to raise money for medical expenses, and there is growing research interest in this topic. However, crowdfunding data are largely composed of unstructured text, thereby posing many challenges for researchers hoping to answer questions about specific medical conditions. Previous studies have used methods that either failed to address major challenges or were poorly scalable to large sample sizes. To enable further research on this emerging funding mechanism in health care, better methods are needed. OBJECTIVE: We sought to validate an algorithm for identifying 11 disease categories in web-based medical crowdfunding campaigns. We hypothesized that a disease identification algorithm combining a named entity recognition (NER) model and word search approach could identify disease categories with high precision and accuracy. Such an algorithm would facilitate further research using these data. METHODS: Web scraping was used to collect data on medical crowdfunding campaigns from GoFundMe (GoFundMe Inc). Using pretrained NER and entity resolution models from Spark NLP for Healthcare in combination with targeted keyword searches, we constructed an algorithm to identify conditions in the campaign descriptions, translate conditions to International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) codes, and predict the presence or absence of 11 disease categories in the campaigns. The classification performance of the algorithm was evaluated against 400 manually labeled campaigns. RESULTS: We collected data on 89,645 crowdfunding campaigns through web scraping. The interrater reliability for detecting the presence of broad disease categories in the campaign descriptions was high (Cohen κ: range 0.69-0.96). The NER and entity resolution models identified 6594 unique (276,020 total) ICD-10-CM codes among all of the crowdfunding campaigns in our sample. Through our word search, we identified 3261 additional campaigns for which a medical condition was not otherwise detected with the NER model. When averaged across all disease categories and weighted by the number of campaigns that mentioned each disease category, the algorithm demonstrated an overall precision of 0.83 (range 0.48-0.97), a recall of 0.77 (range 0.42-0.98), an F(1) score of 0.78 (range 0.56-0.96), and an accuracy of 95% (range 90%-98%). CONCLUSIONS: A disease identification algorithm combining pretrained natural language processing models and ICD-10-CM code–based disease categorization was able to detect 11 disease categories in medical crowdfunding campaigns with high precision and accuracy. |
format | Online Article Text |
id | pubmed-9257615 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-92576152022-07-07 A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study Doerstling, Steven S Akrobetu, Dennis Engelhard, Matthew M Chen, Felicia Ubel, Peter A J Med Internet Res Original Paper BACKGROUND: Web-based crowdfunding has become a popular method to raise money for medical expenses, and there is growing research interest in this topic. However, crowdfunding data are largely composed of unstructured text, thereby posing many challenges for researchers hoping to answer questions about specific medical conditions. Previous studies have used methods that either failed to address major challenges or were poorly scalable to large sample sizes. To enable further research on this emerging funding mechanism in health care, better methods are needed. OBJECTIVE: We sought to validate an algorithm for identifying 11 disease categories in web-based medical crowdfunding campaigns. We hypothesized that a disease identification algorithm combining a named entity recognition (NER) model and word search approach could identify disease categories with high precision and accuracy. Such an algorithm would facilitate further research using these data. METHODS: Web scraping was used to collect data on medical crowdfunding campaigns from GoFundMe (GoFundMe Inc). Using pretrained NER and entity resolution models from Spark NLP for Healthcare in combination with targeted keyword searches, we constructed an algorithm to identify conditions in the campaign descriptions, translate conditions to International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) codes, and predict the presence or absence of 11 disease categories in the campaigns. The classification performance of the algorithm was evaluated against 400 manually labeled campaigns. RESULTS: We collected data on 89,645 crowdfunding campaigns through web scraping. The interrater reliability for detecting the presence of broad disease categories in the campaign descriptions was high (Cohen κ: range 0.69-0.96). The NER and entity resolution models identified 6594 unique (276,020 total) ICD-10-CM codes among all of the crowdfunding campaigns in our sample. Through our word search, we identified 3261 additional campaigns for which a medical condition was not otherwise detected with the NER model. When averaged across all disease categories and weighted by the number of campaigns that mentioned each disease category, the algorithm demonstrated an overall precision of 0.83 (range 0.48-0.97), a recall of 0.77 (range 0.42-0.98), an F(1) score of 0.78 (range 0.56-0.96), and an accuracy of 95% (range 90%-98%). CONCLUSIONS: A disease identification algorithm combining pretrained natural language processing models and ICD-10-CM code–based disease categorization was able to detect 11 disease categories in medical crowdfunding campaigns with high precision and accuracy. JMIR Publications 2022-06-21 /pmc/articles/PMC9257615/ /pubmed/35727610 http://dx.doi.org/10.2196/32867 Text en ©Steven S Doerstling, Dennis Akrobetu, Matthew M Engelhard, Felicia Chen, Peter A Ubel. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 21.06.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Doerstling, Steven S Akrobetu, Dennis Engelhard, Matthew M Chen, Felicia Ubel, Peter A A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study |
title | A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study |
title_full | A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study |
title_fullStr | A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study |
title_full_unstemmed | A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study |
title_short | A Disease Identification Algorithm for Medical Crowdfunding Campaigns: Validation Study |
title_sort | disease identification algorithm for medical crowdfunding campaigns: validation study |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9257615/ https://www.ncbi.nlm.nih.gov/pubmed/35727610 http://dx.doi.org/10.2196/32867 |
work_keys_str_mv | AT doerstlingstevens adiseaseidentificationalgorithmformedicalcrowdfundingcampaignsvalidationstudy AT akrobetudennis adiseaseidentificationalgorithmformedicalcrowdfundingcampaignsvalidationstudy AT engelhardmatthewm adiseaseidentificationalgorithmformedicalcrowdfundingcampaignsvalidationstudy AT chenfelicia adiseaseidentificationalgorithmformedicalcrowdfundingcampaignsvalidationstudy AT ubelpetera adiseaseidentificationalgorithmformedicalcrowdfundingcampaignsvalidationstudy AT doerstlingstevens diseaseidentificationalgorithmformedicalcrowdfundingcampaignsvalidationstudy AT akrobetudennis diseaseidentificationalgorithmformedicalcrowdfundingcampaignsvalidationstudy AT engelhardmatthewm diseaseidentificationalgorithmformedicalcrowdfundingcampaignsvalidationstudy AT chenfelicia diseaseidentificationalgorithmformedicalcrowdfundingcampaignsvalidationstudy AT ubelpetera diseaseidentificationalgorithmformedicalcrowdfundingcampaignsvalidationstudy |