Cargando…
Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation
BACKGROUND: To provide quality care, modern health care systems must match and link data about the same patient from multiple sources, a function often served by master patient index (MPI) software. Record linkage in the MPI is typically performed manually by health care providers, guided by automat...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10365597/ https://www.ncbi.nlm.nih.gov/pubmed/37384382 http://dx.doi.org/10.2196/44331 |
_version_ | 1785077026873409536 |
---|---|
author | Nelson, Walter Khanna, Nityan Ibrahim, Mohamed Fyfe, Justin Geiger, Maxwell Edwards, Keith Petch, Jeremy |
author_facet | Nelson, Walter Khanna, Nityan Ibrahim, Mohamed Fyfe, Justin Geiger, Maxwell Edwards, Keith Petch, Jeremy |
author_sort | Nelson, Walter |
collection | PubMed |
description | BACKGROUND: To provide quality care, modern health care systems must match and link data about the same patient from multiple sources, a function often served by master patient index (MPI) software. Record linkage in the MPI is typically performed manually by health care providers, guided by automated matching algorithms. These matching algorithms must be configured in advance, such as by setting the weights of patient attributes, usually by someone with knowledge of both the matching algorithm and the patient population being served. OBJECTIVE: We aimed to develop and evaluate a machine learning–based software tool, which automatically configures a patient matching algorithm by learning from pairs of patient records previously linked by humans already present in the database. METHODS: We built a free and open-source software tool to optimize record linkage algorithm parameters based on historical record linkages. The tool uses Bayesian optimization to identify the set of configuration parameters that lead to optimal matching performance in a given patient population, by learning from prior record linkages by humans. The tool is written assuming only the existence of a minimal HTTP application programming interface (API), and so is agnostic to the choice of MPI software, record linkage algorithm, and patient population. As a proof of concept, we integrated our tool with SantéMPI, an open-source MPI. We validated the tool using several synthetic patient populations in SantéMPI by comparing the performance of the optimized configuration in held-out data to SantéMPI’s default matching configuration using sensitivity and specificity. RESULTS: The machine learning–optimized configurations correctly detect over 90% of true record linkages as definite matches in all data sets, with 100% specificity and positive predictive value in all data sets, whereas the baseline detects none. In the largest data set examined, the baseline matching configuration detects possible record linkages with a sensitivity of 90.2% (95% CI 88.4%-92.0%) and specificity of 100%. By comparison, the machine learning–optimized matching configuration attains a sensitivity of 100%, with a decreased specificity of 95.9% (95% CI 95.9%-96.0%). We report significant gains in sensitivity in all data sets examined, at the cost of only marginally decreased specificity. The configuration optimization tool, data, and data set generator have been made freely available. CONCLUSIONS: Our machine learning software tool can be used to significantly improve the performance of existing record linkage algorithms, without knowledge of the algorithm being used or specific details of the patient population being served. |
format | Online Article Text |
id | pubmed-10365597 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-103655972023-07-25 Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation Nelson, Walter Khanna, Nityan Ibrahim, Mohamed Fyfe, Justin Geiger, Maxwell Edwards, Keith Petch, Jeremy JMIR Form Res Original Paper BACKGROUND: To provide quality care, modern health care systems must match and link data about the same patient from multiple sources, a function often served by master patient index (MPI) software. Record linkage in the MPI is typically performed manually by health care providers, guided by automated matching algorithms. These matching algorithms must be configured in advance, such as by setting the weights of patient attributes, usually by someone with knowledge of both the matching algorithm and the patient population being served. OBJECTIVE: We aimed to develop and evaluate a machine learning–based software tool, which automatically configures a patient matching algorithm by learning from pairs of patient records previously linked by humans already present in the database. METHODS: We built a free and open-source software tool to optimize record linkage algorithm parameters based on historical record linkages. The tool uses Bayesian optimization to identify the set of configuration parameters that lead to optimal matching performance in a given patient population, by learning from prior record linkages by humans. The tool is written assuming only the existence of a minimal HTTP application programming interface (API), and so is agnostic to the choice of MPI software, record linkage algorithm, and patient population. As a proof of concept, we integrated our tool with SantéMPI, an open-source MPI. We validated the tool using several synthetic patient populations in SantéMPI by comparing the performance of the optimized configuration in held-out data to SantéMPI’s default matching configuration using sensitivity and specificity. RESULTS: The machine learning–optimized configurations correctly detect over 90% of true record linkages as definite matches in all data sets, with 100% specificity and positive predictive value in all data sets, whereas the baseline detects none. In the largest data set examined, the baseline matching configuration detects possible record linkages with a sensitivity of 90.2% (95% CI 88.4%-92.0%) and specificity of 100%. By comparison, the machine learning–optimized matching configuration attains a sensitivity of 100%, with a decreased specificity of 95.9% (95% CI 95.9%-96.0%). We report significant gains in sensitivity in all data sets examined, at the cost of only marginally decreased specificity. The configuration optimization tool, data, and data set generator have been made freely available. CONCLUSIONS: Our machine learning software tool can be used to significantly improve the performance of existing record linkage algorithms, without knowledge of the algorithm being used or specific details of the patient population being served. JMIR Publications 2023-06-29 /pmc/articles/PMC10365597/ /pubmed/37384382 http://dx.doi.org/10.2196/44331 Text en ©Walter Nelson, Nityan Khanna, Mohamed Ibrahim, Justin Fyfe, Maxwell Geiger, Keith Edwards, Jeremy Petch. Originally published in JMIR Formative Research (https://formative.jmir.org), 29.06.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on https://formative.jmir.org, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Nelson, Walter Khanna, Nityan Ibrahim, Mohamed Fyfe, Justin Geiger, Maxwell Edwards, Keith Petch, Jeremy Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation |
title | Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation |
title_full | Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation |
title_fullStr | Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation |
title_full_unstemmed | Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation |
title_short | Optimizing Patient Record Linkage in a Master Patient Index Using Machine Learning: Algorithm Development and Validation |
title_sort | optimizing patient record linkage in a master patient index using machine learning: algorithm development and validation |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10365597/ https://www.ncbi.nlm.nih.gov/pubmed/37384382 http://dx.doi.org/10.2196/44331 |
work_keys_str_mv | AT nelsonwalter optimizingpatientrecordlinkageinamasterpatientindexusingmachinelearningalgorithmdevelopmentandvalidation AT khannanityan optimizingpatientrecordlinkageinamasterpatientindexusingmachinelearningalgorithmdevelopmentandvalidation AT ibrahimmohamed optimizingpatientrecordlinkageinamasterpatientindexusingmachinelearningalgorithmdevelopmentandvalidation AT fyfejustin optimizingpatientrecordlinkageinamasterpatientindexusingmachinelearningalgorithmdevelopmentandvalidation AT geigermaxwell optimizingpatientrecordlinkageinamasterpatientindexusingmachinelearningalgorithmdevelopmentandvalidation AT edwardskeith optimizingpatientrecordlinkageinamasterpatientindexusingmachinelearningalgorithmdevelopmentandvalidation AT petchjeremy optimizingpatientrecordlinkageinamasterpatientindexusingmachinelearningalgorithmdevelopmentandvalidation |