Cargando…

Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation

BACKGROUND: To provide quality care, modern health care systems must match and link data about the same patient from multiple sources, a function often served by master patient index (MPI) software. Record linkage in the MPI is typically performed manually by health care providers, guided by automat...

Descripción completa

Detalles Bibliográficos
Autores principales: Nelson, Walter, Khanna, Nityan, Ibrahim, Mohamed, Fyfe, Justin, Geiger, Maxwell, Edwards, Keith, Petch, Jeremy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10365597/
https://www.ncbi.nlm.nih.gov/pubmed/37384382
http://dx.doi.org/10.2196/44331
_version_ 1785077026873409536
author Nelson, Walter
Khanna, Nityan
Ibrahim, Mohamed
Fyfe, Justin
Geiger, Maxwell
Edwards, Keith
Petch, Jeremy
author_facet Nelson, Walter
Khanna, Nityan
Ibrahim, Mohamed
Fyfe, Justin
Geiger, Maxwell
Edwards, Keith
Petch, Jeremy
author_sort Nelson, Walter
collection PubMed
description BACKGROUND: To provide quality care, modern health care systems must match and link data about the same patient from multiple sources, a function often served by master patient index (MPI) software. Record linkage in the MPI is typically performed manually by health care providers, guided by automated matching algorithms. These matching algorithms must be configured in advance, such as by setting the weights of patient attributes, usually by someone with knowledge of both the matching algorithm and the patient population being served. OBJECTIVE: We aimed to develop and evaluate a machine learning–based software tool, which automatically configures a patient matching algorithm by learning from pairs of patient records previously linked by humans already present in the database. METHODS: We built a free and open-source software tool to optimize record linkage algorithm parameters based on historical record linkages. The tool uses Bayesian optimization to identify the set of configuration parameters that lead to optimal matching performance in a given patient population, by learning from prior record linkages by humans. The tool is written assuming only the existence of a minimal HTTP application programming interface (API), and so is agnostic to the choice of MPI software, record linkage algorithm, and patient population. As a proof of concept, we integrated our tool with SantéMPI, an open-source MPI. We validated the tool using several synthetic patient populations in SantéMPI by comparing the performance of the optimized configuration in held-out data to SantéMPI’s default matching configuration using sensitivity and specificity. RESULTS: The machine learning–optimized configurations correctly detect over 90% of true record linkages as definite matches in all data sets, with 100% specificity and positive predictive value in all data sets, whereas the baseline detects none. In the largest data set examined, the baseline matching configuration detects possible record linkages with a sensitivity of 90.2% (95% CI 88.4%-92.0%) and specificity of 100%. By comparison, the machine learning–optimized matching configuration attains a sensitivity of 100%, with a decreased specificity of 95.9% (95% CI 95.9%-96.0%). We report significant gains in sensitivity in all data sets examined, at the cost of only marginally decreased specificity. The configuration optimization tool, data, and data set generator have been made freely available. CONCLUSIONS: Our machine learning software tool can be used to significantly improve the performance of existing record linkage algorithms, without knowledge of the algorithm being used or specific details of the patient population being served.
format Online
Article
Text
id pubmed-10365597
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-103655972023-07-25 Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation Nelson, Walter Khanna, Nityan Ibrahim, Mohamed Fyfe, Justin Geiger, Maxwell Edwards, Keith Petch, Jeremy JMIR Form Res Original Paper BACKGROUND: To provide quality care, modern health care systems must match and link data about the same patient from multiple sources, a function often served by master patient index (MPI) software. Record linkage in the MPI is typically performed manually by health care providers, guided by automated matching algorithms. These matching algorithms must be configured in advance, such as by setting the weights of patient attributes, usually by someone with knowledge of both the matching algorithm and the patient population being served. OBJECTIVE: We aimed to develop and evaluate a machine learning–based software tool, which automatically configures a patient matching algorithm by learning from pairs of patient records previously linked by humans already present in the database. METHODS: We built a free and open-source software tool to optimize record linkage algorithm parameters based on historical record linkages. The tool uses Bayesian optimization to identify the set of configuration parameters that lead to optimal matching performance in a given patient population, by learning from prior record linkages by humans. The tool is written assuming only the existence of a minimal HTTP application programming interface (API), and so is agnostic to the choice of MPI software, record linkage algorithm, and patient population. As a proof of concept, we integrated our tool with SantéMPI, an open-source MPI. We validated the tool using several synthetic patient populations in SantéMPI by comparing the performance of the optimized configuration in held-out data to SantéMPI’s default matching configuration using sensitivity and specificity. RESULTS: The machine learning–optimized configurations correctly detect over 90% of true record linkages as definite matches in all data sets, with 100% specificity and positive predictive value in all data sets, whereas the baseline detects none. In the largest data set examined, the baseline matching configuration detects possible record linkages with a sensitivity of 90.2% (95% CI 88.4%-92.0%) and specificity of 100%. By comparison, the machine learning–optimized matching configuration attains a sensitivity of 100%, with a decreased specificity of 95.9% (95% CI 95.9%-96.0%). We report significant gains in sensitivity in all data sets examined, at the cost of only marginally decreased specificity. The configuration optimization tool, data, and data set generator have been made freely available. CONCLUSIONS: Our machine learning software tool can be used to significantly improve the performance of existing record linkage algorithms, without knowledge of the algorithm being used or specific details of the patient population being served. JMIR Publications 2023-06-29 /pmc/articles/PMC10365597/ /pubmed/37384382 http://dx.doi.org/10.2196/44331 Text en ©Walter Nelson, Nityan Khanna, Mohamed Ibrahim, Justin Fyfe, Maxwell Geiger, Keith Edwards, Jeremy Petch. Originally published in JMIR Formative Research (https://formative.jmir.org), 29.06.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Nelson, Walter
Khanna, Nityan
Ibrahim, Mohamed
Fyfe, Justin
Geiger, Maxwell
Edwards, Keith
Petch, Jeremy
Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation
title Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation
title_full Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation
title_fullStr Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation
title_full_unstemmed Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation
title_short Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation
title_sort optimizing patient record linkage in a master patient index using machine learning: algorithm development and validation
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10365597/
https://www.ncbi.nlm.nih.gov/pubmed/37384382
http://dx.doi.org/10.2196/44331
work_keys_str_mv AT nelsonwalter optimizingpatientrecordlinkageinamasterpatientindexusingmachinelearningalgorithmdevelopmentandvalidation
AT khannanityan optimizingpatientrecordlinkageinamasterpatientindexusingmachinelearningalgorithmdevelopmentandvalidation
AT ibrahimmohamed optimizingpatientrecordlinkageinamasterpatientindexusingmachinelearningalgorithmdevelopmentandvalidation
AT fyfejustin optimizingpatientrecordlinkageinamasterpatientindexusingmachinelearningalgorithmdevelopmentandvalidation
AT geigermaxwell optimizingpatientrecordlinkageinamasterpatientindexusingmachinelearningalgorithmdevelopmentandvalidation
AT edwardskeith optimizingpatientrecordlinkageinamasterpatientindexusingmachinelearningalgorithmdevelopmentandvalidation
AT petchjeremy optimizingpatientrecordlinkageinamasterpatientindexusingmachinelearningalgorithmdevelopmentandvalidation