Cargando…

A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks

Contact tracing for venereal disease control has been widespread since 1936 and relies on reported information about contacts’ attributes to determine whether two contacts may represent the same individual. We developed and implemented a gold-standard for determining overlap between contacts reporte...

Descripción completa

Detalles Bibliográficos
Autores principales: Schneider, John, Schumm, L. Philip, Fraser, Maya, Yeldandi, Vijay, Liao, Chuanhong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5993735/
https://www.ncbi.nlm.nih.gov/pubmed/29884882
http://dx.doi.org/10.1038/s41598-018-26794-7
_version_ 1783330269523083264
author Schneider, John
Schumm, L. Philip
Fraser, Maya
Yeldandi, Vijay
Liao, Chuanhong
author_facet Schneider, John
Schumm, L. Philip
Fraser, Maya
Yeldandi, Vijay
Liao, Chuanhong
author_sort Schneider, John
collection PubMed
description Contact tracing for venereal disease control has been widespread since 1936 and relies on reported information about contacts’ attributes to determine whether two contacts may represent the same individual. We developed and implemented a gold-standard for determining overlap between contacts reported by different individuals using cell phone numbers as unique identifiers. This method was then used to evaluate the performance of using reported names and demographic characteristics to infer overlap. Cell-phone numbers, names and demographic data for a sample of high-risk men in India and their contacts were collected using a novel, hybrid instrument involving both cell-phone data extraction and Computer-Assisted Personal Interviewing (CAPI). Logistic regression was used to model the probability that a pair of contacts reported by different respondents were identical, based on the correspondence between their reported names and attributes. A discrete mixture model is proposed which provides predictions nearly as good as the logistic model but may be used in a new population without re-calibration. Despite achieving AUCs of 0.83–0.86, the low rate of true overlap among a very large number of contact pairs still results in a high rate of false positives. Next generation contact tracing calls for more archived or digital matching processes.
format Online
Article
Text
id pubmed-5993735
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-59937352018-07-05 A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks Schneider, John Schumm, L. Philip Fraser, Maya Yeldandi, Vijay Liao, Chuanhong Sci Rep Article Contact tracing for venereal disease control has been widespread since 1936 and relies on reported information about contacts’ attributes to determine whether two contacts may represent the same individual. We developed and implemented a gold-standard for determining overlap between contacts reported by different individuals using cell phone numbers as unique identifiers. This method was then used to evaluate the performance of using reported names and demographic characteristics to infer overlap. Cell-phone numbers, names and demographic data for a sample of high-risk men in India and their contacts were collected using a novel, hybrid instrument involving both cell-phone data extraction and Computer-Assisted Personal Interviewing (CAPI). Logistic regression was used to model the probability that a pair of contacts reported by different respondents were identical, based on the correspondence between their reported names and attributes. A discrete mixture model is proposed which provides predictions nearly as good as the logistic model but may be used in a new population without re-calibration. Despite achieving AUCs of 0.83–0.86, the low rate of true overlap among a very large number of contact pairs still results in a high rate of false positives. Next generation contact tracing calls for more archived or digital matching processes. Nature Publishing Group UK 2018-06-08 /pmc/articles/PMC5993735/ /pubmed/29884882 http://dx.doi.org/10.1038/s41598-018-26794-7 Text en © The Author(s) 2018 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Schneider, John
Schumm, L. Philip
Fraser, Maya
Yeldandi, Vijay
Liao, Chuanhong
A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks
title A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks
title_full A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks
title_fullStr A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks
title_full_unstemmed A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks
title_short A Gold-Standard for Entity Resolution within Sexually Transmitted Infection Networks
title_sort gold-standard for entity resolution within sexually transmitted infection networks
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5993735/
https://www.ncbi.nlm.nih.gov/pubmed/29884882
http://dx.doi.org/10.1038/s41598-018-26794-7
work_keys_str_mv AT schneiderjohn agoldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT schummlphilip agoldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT frasermaya agoldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT yeldandivijay agoldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT liaochuanhong agoldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT schneiderjohn goldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT schummlphilip goldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT frasermaya goldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT yeldandivijay goldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks
AT liaochuanhong goldstandardforentityresolutionwithinsexuallytransmittedinfectionnetworks