Cargando…

Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients

BACKGROUND: In case-control studies most algorithms allow the controls to be sampled several times, which is not always optimal. If many controls are available and adjustment for several covariates is necessary, matching without replacement might increase statistical efficiency. Comparing similar un...

Descripción completa

Detalles Bibliográficos
Autores principales: Mamouris, Pavlos, Nassiri, Vahid, Molenberghs, Geert, van den Akker, Marjan, van der Meer, Joep, Vaes, Bert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8019172/
https://www.ncbi.nlm.nih.gov/pubmed/33810785
http://dx.doi.org/10.1186/s12874-021-01256-3
_version_ 1783674325909372928
author Mamouris, Pavlos
Nassiri, Vahid
Molenberghs, Geert
van den Akker, Marjan
van der Meer, Joep
Vaes, Bert
author_facet Mamouris, Pavlos
Nassiri, Vahid
Molenberghs, Geert
van den Akker, Marjan
van der Meer, Joep
Vaes, Bert
author_sort Mamouris, Pavlos
collection PubMed
description BACKGROUND: In case-control studies most algorithms allow the controls to be sampled several times, which is not always optimal. If many controls are available and adjustment for several covariates is necessary, matching without replacement might increase statistical efficiency. Comparing similar units when having observational data is of utter importance, since confounding and selection bias is present. The aim was twofold, firstly to create a method that accommodates the option that a control is not resampled, and second, to display several scenarios that identify changes of Odds Ratios (ORs) while increasing the balance of the matched sample. METHODS: The algorithm was derived in an iterative way starting from the pre-processing steps to derive the data until its application in a study to investigate the risk of antibiotics on colorectal cancer in the INTEGO registry (Flanders, Belgium). Different scenarios were developed to investigate the fluctuation of ORs using the combination of exact and varying variables with or without replacement of controls. To achieve balance in the population, we introduced the Comorbidity Index (CI) variable, which is the sum of chronic diseases as a means to have comparable units for drawing valid associations. RESULTS: This algorithm is fast and optimal. We simulated data and demonstrated that the run-time of matching even with millions of patients is minimal. Optimal, since the closest controls is always captured (using the appropriate ordering and by creating some auxiliary variables), and in the scenario that a case has only one control, we assure that this control will be matched to this case, thus maximizing the cases to be used in the analysis. In total, 72 different scenarios were displayed indicating the fluctuation of ORs, and revealing patterns, especially a drop when balancing the population. CONCLUSIONS: We created an optimal and computationally efficient algorithm to derive a matched case-control sample with and without replacement of controls. The code and the functions are publicly available as an open source in an R package. Finally, we emphasize the importance of displaying several scenarios and assess the difference of ORs while using an index to balance population in observational data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01256-3.
format Online
Article
Text
id pubmed-8019172
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80191722021-04-05 Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients Mamouris, Pavlos Nassiri, Vahid Molenberghs, Geert van den Akker, Marjan van der Meer, Joep Vaes, Bert BMC Med Res Methodol Research Article BACKGROUND: In case-control studies most algorithms allow the controls to be sampled several times, which is not always optimal. If many controls are available and adjustment for several covariates is necessary, matching without replacement might increase statistical efficiency. Comparing similar units when having observational data is of utter importance, since confounding and selection bias is present. The aim was twofold, firstly to create a method that accommodates the option that a control is not resampled, and second, to display several scenarios that identify changes of Odds Ratios (ORs) while increasing the balance of the matched sample. METHODS: The algorithm was derived in an iterative way starting from the pre-processing steps to derive the data until its application in a study to investigate the risk of antibiotics on colorectal cancer in the INTEGO registry (Flanders, Belgium). Different scenarios were developed to investigate the fluctuation of ORs using the combination of exact and varying variables with or without replacement of controls. To achieve balance in the population, we introduced the Comorbidity Index (CI) variable, which is the sum of chronic diseases as a means to have comparable units for drawing valid associations. RESULTS: This algorithm is fast and optimal. We simulated data and demonstrated that the run-time of matching even with millions of patients is minimal. Optimal, since the closest controls is always captured (using the appropriate ordering and by creating some auxiliary variables), and in the scenario that a case has only one control, we assure that this control will be matched to this case, thus maximizing the cases to be used in the analysis. In total, 72 different scenarios were displayed indicating the fluctuation of ORs, and revealing patterns, especially a drop when balancing the population. CONCLUSIONS: We created an optimal and computationally efficient algorithm to derive a matched case-control sample with and without replacement of controls. The code and the functions are publicly available as an open source in an R package. Finally, we emphasize the importance of displaying several scenarios and assess the difference of ORs while using an index to balance population in observational data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01256-3. BioMed Central 2021-04-02 /pmc/articles/PMC8019172/ /pubmed/33810785 http://dx.doi.org/10.1186/s12874-021-01256-3 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Mamouris, Pavlos
Nassiri, Vahid
Molenberghs, Geert
van den Akker, Marjan
van der Meer, Joep
Vaes, Bert
Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients
title Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients
title_full Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients
title_fullStr Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients
title_full_unstemmed Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients
title_short Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients
title_sort fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8019172/
https://www.ncbi.nlm.nih.gov/pubmed/33810785
http://dx.doi.org/10.1186/s12874-021-01256-3
work_keys_str_mv AT mamourispavlos fastandoptimalalgorithmforcasecontrolmatchingusingregistrydataapplicationontheantibioticsuseofcolorectalcancerpatients
AT nassirivahid fastandoptimalalgorithmforcasecontrolmatchingusingregistrydataapplicationontheantibioticsuseofcolorectalcancerpatients
AT molenberghsgeert fastandoptimalalgorithmforcasecontrolmatchingusingregistrydataapplicationontheantibioticsuseofcolorectalcancerpatients
AT vandenakkermarjan fastandoptimalalgorithmforcasecontrolmatchingusingregistrydataapplicationontheantibioticsuseofcolorectalcancerpatients
AT vandermeerjoep fastandoptimalalgorithmforcasecontrolmatchingusingregistrydataapplicationontheantibioticsuseofcolorectalcancerpatients
AT vaesbert fastandoptimalalgorithmforcasecontrolmatchingusingregistrydataapplicationontheantibioticsuseofcolorectalcancerpatients