Cargando…
Toward Using Twitter for PrEP-Related Interventions: An Automated Natural Language Processing Pipeline for Identifying Gay or Bisexual Men in the United States
BACKGROUND: Pre-exposure prophylaxis (PrEP) is highly effective at preventing the acquisition of HIV. There is a substantial gap, however, between the number of people in the United States who have indications for PrEP and the number of them who are prescribed PrEP. Although Twitter content has been...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9086871/ https://www.ncbi.nlm.nih.gov/pubmed/35468092 http://dx.doi.org/10.2196/32405 |
_version_ | 1784704100056694784 |
---|---|
author | Klein, Ari Z Meanley, Steven O'Connor, Karen Bauermeister, José A Gonzalez-Hernandez, Graciela |
author_facet | Klein, Ari Z Meanley, Steven O'Connor, Karen Bauermeister, José A Gonzalez-Hernandez, Graciela |
author_sort | Klein, Ari Z |
collection | PubMed |
description | BACKGROUND: Pre-exposure prophylaxis (PrEP) is highly effective at preventing the acquisition of HIV. There is a substantial gap, however, between the number of people in the United States who have indications for PrEP and the number of them who are prescribed PrEP. Although Twitter content has been analyzed as a source of PrEP-related data (eg, barriers), methods have not been developed to enable the use of Twitter as a platform for implementing PrEP-related interventions. OBJECTIVE: Men who have sex with men (MSM) are the population most affected by HIV in the United States. Therefore, the objectives of this study were to (1) develop an automated natural language processing (NLP) pipeline for identifying men in the United States who have reported on Twitter that they are gay, bisexual, or MSM and (2) assess the extent to which they demographically represent MSM in the United States with new HIV diagnoses. METHODS: Between September 2020 and January 2021, we used the Twitter Streaming Application Programming Interface (API) to collect more than 3 million tweets containing keywords that men may include in posts reporting that they are gay, bisexual, or MSM. We deployed handwritten, high-precision regular expressions—designed to filter out noise and identify actual self-reports—on the tweets and their user profile metadata. We identified 10,043 unique users geolocated in the United States and drew upon a validated NLP tool to automatically identify their ages. RESULTS: By manually distinguishing true- and false-positive self-reports in the tweets or profiles of 1000 (10%) of the 10,043 users identified by our automated pipeline, we established that our pipeline has a precision of 0.85. Among the 8756 users for which a US state–level geolocation was detected, 5096 (58.2%) were in the 10 states with the highest numbers of new HIV diagnoses. Among the 6240 users for which a county-level geolocation was detected, 4252 (68.1%) were in counties or states considered priority jurisdictions by the Ending the HIV Epidemic initiative. Furthermore, the age distribution of the users reflected that of MSM in the United States with new HIV diagnoses. CONCLUSIONS: Our automated NLP pipeline can be used to identify MSM in the United States who may be at risk of acquiring HIV, laying the groundwork for using Twitter on a large scale to directly target PrEP-related interventions at this population. |
format | Online Article Text |
id | pubmed-9086871 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-90868712022-05-11 Toward Using Twitter for PrEP-Related Interventions: An Automated Natural Language Processing Pipeline for Identifying Gay or Bisexual Men in the United States Klein, Ari Z Meanley, Steven O'Connor, Karen Bauermeister, José A Gonzalez-Hernandez, Graciela JMIR Public Health Surveill Original Paper BACKGROUND: Pre-exposure prophylaxis (PrEP) is highly effective at preventing the acquisition of HIV. There is a substantial gap, however, between the number of people in the United States who have indications for PrEP and the number of them who are prescribed PrEP. Although Twitter content has been analyzed as a source of PrEP-related data (eg, barriers), methods have not been developed to enable the use of Twitter as a platform for implementing PrEP-related interventions. OBJECTIVE: Men who have sex with men (MSM) are the population most affected by HIV in the United States. Therefore, the objectives of this study were to (1) develop an automated natural language processing (NLP) pipeline for identifying men in the United States who have reported on Twitter that they are gay, bisexual, or MSM and (2) assess the extent to which they demographically represent MSM in the United States with new HIV diagnoses. METHODS: Between September 2020 and January 2021, we used the Twitter Streaming Application Programming Interface (API) to collect more than 3 million tweets containing keywords that men may include in posts reporting that they are gay, bisexual, or MSM. We deployed handwritten, high-precision regular expressions—designed to filter out noise and identify actual self-reports—on the tweets and their user profile metadata. We identified 10,043 unique users geolocated in the United States and drew upon a validated NLP tool to automatically identify their ages. RESULTS: By manually distinguishing true- and false-positive self-reports in the tweets or profiles of 1000 (10%) of the 10,043 users identified by our automated pipeline, we established that our pipeline has a precision of 0.85. Among the 8756 users for which a US state–level geolocation was detected, 5096 (58.2%) were in the 10 states with the highest numbers of new HIV diagnoses. Among the 6240 users for which a county-level geolocation was detected, 4252 (68.1%) were in counties or states considered priority jurisdictions by the Ending the HIV Epidemic initiative. Furthermore, the age distribution of the users reflected that of MSM in the United States with new HIV diagnoses. CONCLUSIONS: Our automated NLP pipeline can be used to identify MSM in the United States who may be at risk of acquiring HIV, laying the groundwork for using Twitter on a large scale to directly target PrEP-related interventions at this population. JMIR Publications 2022-04-25 /pmc/articles/PMC9086871/ /pubmed/35468092 http://dx.doi.org/10.2196/32405 Text en ©Ari Z Klein, Steven Meanley, Karen O'Connor, José A Bauermeister, Graciela Gonzalez-Hernandez. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 25.04.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Klein, Ari Z Meanley, Steven O'Connor, Karen Bauermeister, José A Gonzalez-Hernandez, Graciela Toward Using Twitter for PrEP-Related Interventions: An Automated Natural Language Processing Pipeline for Identifying Gay or Bisexual Men in the United States |
title | Toward Using Twitter for PrEP-Related Interventions: An Automated Natural Language Processing Pipeline for Identifying Gay or Bisexual Men in the United States |
title_full | Toward Using Twitter for PrEP-Related Interventions: An Automated Natural Language Processing Pipeline for Identifying Gay or Bisexual Men in the United States |
title_fullStr | Toward Using Twitter for PrEP-Related Interventions: An Automated Natural Language Processing Pipeline for Identifying Gay or Bisexual Men in the United States |
title_full_unstemmed | Toward Using Twitter for PrEP-Related Interventions: An Automated Natural Language Processing Pipeline for Identifying Gay or Bisexual Men in the United States |
title_short | Toward Using Twitter for PrEP-Related Interventions: An Automated Natural Language Processing Pipeline for Identifying Gay or Bisexual Men in the United States |
title_sort | toward using twitter for prep-related interventions: an automated natural language processing pipeline for identifying gay or bisexual men in the united states |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9086871/ https://www.ncbi.nlm.nih.gov/pubmed/35468092 http://dx.doi.org/10.2196/32405 |
work_keys_str_mv | AT kleinariz towardusingtwitterforpreprelatedinterventionsanautomatednaturallanguageprocessingpipelineforidentifyinggayorbisexualmenintheunitedstates AT meanleysteven towardusingtwitterforpreprelatedinterventionsanautomatednaturallanguageprocessingpipelineforidentifyinggayorbisexualmenintheunitedstates AT oconnorkaren towardusingtwitterforpreprelatedinterventionsanautomatednaturallanguageprocessingpipelineforidentifyinggayorbisexualmenintheunitedstates AT bauermeisterjosea towardusingtwitterforpreprelatedinterventionsanautomatednaturallanguageprocessingpipelineforidentifyinggayorbisexualmenintheunitedstates AT gonzalezhernandezgraciela towardusingtwitterforpreprelatedinterventionsanautomatednaturallanguageprocessingpipelineforidentifyinggayorbisexualmenintheunitedstates |