Cargando…
Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients
BACKGROUND: In case-control studies most algorithms allow the controls to be sampled several times, which is not always optimal. If many controls are available and adjustment for several covariates is necessary, matching without replacement might increase statistical efficiency. Comparing similar un...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8019172/ https://www.ncbi.nlm.nih.gov/pubmed/33810785 http://dx.doi.org/10.1186/s12874-021-01256-3 |
_version_ | 1783674325909372928 |
---|---|
author | Mamouris, Pavlos Nassiri, Vahid Molenberghs, Geert van den Akker, Marjan van der Meer, Joep Vaes, Bert |
author_facet | Mamouris, Pavlos Nassiri, Vahid Molenberghs, Geert van den Akker, Marjan van der Meer, Joep Vaes, Bert |
author_sort | Mamouris, Pavlos |
collection | PubMed |
description | BACKGROUND: In case-control studies most algorithms allow the controls to be sampled several times, which is not always optimal. If many controls are available and adjustment for several covariates is necessary, matching without replacement might increase statistical efficiency. Comparing similar units when having observational data is of utter importance, since confounding and selection bias is present. The aim was twofold, firstly to create a method that accommodates the option that a control is not resampled, and second, to display several scenarios that identify changes of Odds Ratios (ORs) while increasing the balance of the matched sample. METHODS: The algorithm was derived in an iterative way starting from the pre-processing steps to derive the data until its application in a study to investigate the risk of antibiotics on colorectal cancer in the INTEGO registry (Flanders, Belgium). Different scenarios were developed to investigate the fluctuation of ORs using the combination of exact and varying variables with or without replacement of controls. To achieve balance in the population, we introduced the Comorbidity Index (CI) variable, which is the sum of chronic diseases as a means to have comparable units for drawing valid associations. RESULTS: This algorithm is fast and optimal. We simulated data and demonstrated that the run-time of matching even with millions of patients is minimal. Optimal, since the closest controls is always captured (using the appropriate ordering and by creating some auxiliary variables), and in the scenario that a case has only one control, we assure that this control will be matched to this case, thus maximizing the cases to be used in the analysis. In total, 72 different scenarios were displayed indicating the fluctuation of ORs, and revealing patterns, especially a drop when balancing the population. CONCLUSIONS: We created an optimal and computationally efficient algorithm to derive a matched case-control sample with and without replacement of controls. The code and the functions are publicly available as an open source in an R package. Finally, we emphasize the importance of displaying several scenarios and assess the difference of ORs while using an index to balance population in observational data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01256-3. |
format | Online Article Text |
id | pubmed-8019172 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-80191722021-04-05 Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients Mamouris, Pavlos Nassiri, Vahid Molenberghs, Geert van den Akker, Marjan van der Meer, Joep Vaes, Bert BMC Med Res Methodol Research Article BACKGROUND: In case-control studies most algorithms allow the controls to be sampled several times, which is not always optimal. If many controls are available and adjustment for several covariates is necessary, matching without replacement might increase statistical efficiency. Comparing similar units when having observational data is of utter importance, since confounding and selection bias is present. The aim was twofold, firstly to create a method that accommodates the option that a control is not resampled, and second, to display several scenarios that identify changes of Odds Ratios (ORs) while increasing the balance of the matched sample. METHODS: The algorithm was derived in an iterative way starting from the pre-processing steps to derive the data until its application in a study to investigate the risk of antibiotics on colorectal cancer in the INTEGO registry (Flanders, Belgium). Different scenarios were developed to investigate the fluctuation of ORs using the combination of exact and varying variables with or without replacement of controls. To achieve balance in the population, we introduced the Comorbidity Index (CI) variable, which is the sum of chronic diseases as a means to have comparable units for drawing valid associations. RESULTS: This algorithm is fast and optimal. We simulated data and demonstrated that the run-time of matching even with millions of patients is minimal. Optimal, since the closest controls is always captured (using the appropriate ordering and by creating some auxiliary variables), and in the scenario that a case has only one control, we assure that this control will be matched to this case, thus maximizing the cases to be used in the analysis. In total, 72 different scenarios were displayed indicating the fluctuation of ORs, and revealing patterns, especially a drop when balancing the population. CONCLUSIONS: We created an optimal and computationally efficient algorithm to derive a matched case-control sample with and without replacement of controls. The code and the functions are publicly available as an open source in an R package. Finally, we emphasize the importance of displaying several scenarios and assess the difference of ORs while using an index to balance population in observational data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-021-01256-3. BioMed Central 2021-04-02 /pmc/articles/PMC8019172/ /pubmed/33810785 http://dx.doi.org/10.1186/s12874-021-01256-3 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Mamouris, Pavlos Nassiri, Vahid Molenberghs, Geert van den Akker, Marjan van der Meer, Joep Vaes, Bert Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients |
title | Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients |
title_full | Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients |
title_fullStr | Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients |
title_full_unstemmed | Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients |
title_short | Fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients |
title_sort | fast and optimal algorithm for case-control matching using registry data: application on the antibiotics use of colorectal cancer patients |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8019172/ https://www.ncbi.nlm.nih.gov/pubmed/33810785 http://dx.doi.org/10.1186/s12874-021-01256-3 |
work_keys_str_mv | AT mamourispavlos fastandoptimalalgorithmforcasecontrolmatchingusingregistrydataapplicationontheantibioticsuseofcolorectalcancerpatients AT nassirivahid fastandoptimalalgorithmforcasecontrolmatchingusingregistrydataapplicationontheantibioticsuseofcolorectalcancerpatients AT molenberghsgeert fastandoptimalalgorithmforcasecontrolmatchingusingregistrydataapplicationontheantibioticsuseofcolorectalcancerpatients AT vandenakkermarjan fastandoptimalalgorithmforcasecontrolmatchingusingregistrydataapplicationontheantibioticsuseofcolorectalcancerpatients AT vandermeerjoep fastandoptimalalgorithmforcasecontrolmatchingusingregistrydataapplicationontheantibioticsuseofcolorectalcancerpatients AT vaesbert fastandoptimalalgorithmforcasecontrolmatchingusingregistrydataapplicationontheantibioticsuseofcolorectalcancerpatients |