Cargando…

Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments

BACKGROUND: We developed a novel approach to minimize batch effects when assigning samples to batches. Our algorithm selects a batch allocation, among all possible ways of assigning samples to batches, that minimizes differences in average propensity score between batches. This strategy was compared...

Descripción completa

Detalles Bibliográficos
Autores principales: Carry, Patrick M., Vigers, Tim, Vanderlinden, Lauren A., Keeter, Carson, Dong, Fran, Buckner, Teresa, Litkowski, Elizabeth, Yang, Ivana, Norris, Jill M., Kechris, Katerina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9990331/
https://www.ncbi.nlm.nih.gov/pubmed/36882691
http://dx.doi.org/10.1186/s12859-023-05202-6
_version_ 1784901917451747328
author Carry, Patrick M.
Vigers, Tim
Vanderlinden, Lauren A.
Keeter, Carson
Dong, Fran
Buckner, Teresa
Litkowski, Elizabeth
Yang, Ivana
Norris, Jill M.
Kechris, Katerina
author_facet Carry, Patrick M.
Vigers, Tim
Vanderlinden, Lauren A.
Keeter, Carson
Dong, Fran
Buckner, Teresa
Litkowski, Elizabeth
Yang, Ivana
Norris, Jill M.
Kechris, Katerina
author_sort Carry, Patrick M.
collection PubMed
description BACKGROUND: We developed a novel approach to minimize batch effects when assigning samples to batches. Our algorithm selects a batch allocation, among all possible ways of assigning samples to batches, that minimizes differences in average propensity score between batches. This strategy was compared to randomization and stratified randomization in a case–control study (30 per group) with a covariate (case vs control, represented as β1, set to be null) and two biologically relevant confounding variables (age, represented as β2, and hemoglobin A1c (HbA1c), represented as β3). Gene expression values were obtained from a publicly available dataset of expression data obtained from pancreas islet cells. Batch effects were simulated as twice the median biological variation across the gene expression dataset and were added to the publicly available dataset to simulate a batch effect condition. Bias was calculated as the absolute difference between observed betas under the batch allocation strategies and the true beta (no batch effects). Bias was also evaluated after adjustment for batch effects using ComBat as well as a linear regression model. In order to understand performance of our optimal allocation strategy under the alternative hypothesis, we also evaluated bias at a single gene associated with both age and HbA1c levels in the ‘true’ dataset (CAPN13 gene). RESULTS: Pre-batch correction, under the null hypothesis (β1), maximum absolute bias and root mean square (RMS) of maximum absolute bias, were minimized using the optimal allocation strategy. Under the alternative hypothesis (β2 and β3 for the CAPN13 gene), maximum absolute bias and RMS of maximum absolute bias were also consistently lower using the optimal allocation strategy. ComBat and the regression batch adjustment methods performed well as the bias estimates moved towards the true values in all conditions under both the null and alternative hypotheses. Although the differences between methods were less pronounced following batch correction, estimates of bias (average and RMS) were consistently lower using the optimal allocation strategy under both the null and alternative hypotheses. CONCLUSIONS: Our algorithm provides an extremely flexible and effective method for assigning samples to batches by exploiting knowledge of covariates prior to sample allocation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05202-6.
format Online
Article
Text
id pubmed-9990331
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-99903312023-03-08 Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments Carry, Patrick M. Vigers, Tim Vanderlinden, Lauren A. Keeter, Carson Dong, Fran Buckner, Teresa Litkowski, Elizabeth Yang, Ivana Norris, Jill M. Kechris, Katerina BMC Bioinformatics Research BACKGROUND: We developed a novel approach to minimize batch effects when assigning samples to batches. Our algorithm selects a batch allocation, among all possible ways of assigning samples to batches, that minimizes differences in average propensity score between batches. This strategy was compared to randomization and stratified randomization in a case–control study (30 per group) with a covariate (case vs control, represented as β1, set to be null) and two biologically relevant confounding variables (age, represented as β2, and hemoglobin A1c (HbA1c), represented as β3). Gene expression values were obtained from a publicly available dataset of expression data obtained from pancreas islet cells. Batch effects were simulated as twice the median biological variation across the gene expression dataset and were added to the publicly available dataset to simulate a batch effect condition. Bias was calculated as the absolute difference between observed betas under the batch allocation strategies and the true beta (no batch effects). Bias was also evaluated after adjustment for batch effects using ComBat as well as a linear regression model. In order to understand performance of our optimal allocation strategy under the alternative hypothesis, we also evaluated bias at a single gene associated with both age and HbA1c levels in the ‘true’ dataset (CAPN13 gene). RESULTS: Pre-batch correction, under the null hypothesis (β1), maximum absolute bias and root mean square (RMS) of maximum absolute bias, were minimized using the optimal allocation strategy. Under the alternative hypothesis (β2 and β3 for the CAPN13 gene), maximum absolute bias and RMS of maximum absolute bias were also consistently lower using the optimal allocation strategy. ComBat and the regression batch adjustment methods performed well as the bias estimates moved towards the true values in all conditions under both the null and alternative hypotheses. Although the differences between methods were less pronounced following batch correction, estimates of bias (average and RMS) were consistently lower using the optimal allocation strategy under both the null and alternative hypotheses. CONCLUSIONS: Our algorithm provides an extremely flexible and effective method for assigning samples to batches by exploiting knowledge of covariates prior to sample allocation. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05202-6. BioMed Central 2023-03-07 /pmc/articles/PMC9990331/ /pubmed/36882691 http://dx.doi.org/10.1186/s12859-023-05202-6 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Carry, Patrick M.
Vigers, Tim
Vanderlinden, Lauren A.
Keeter, Carson
Dong, Fran
Buckner, Teresa
Litkowski, Elizabeth
Yang, Ivana
Norris, Jill M.
Kechris, Katerina
Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments
title Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments
title_full Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments
title_fullStr Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments
title_full_unstemmed Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments
title_short Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments
title_sort propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9990331/
https://www.ncbi.nlm.nih.gov/pubmed/36882691
http://dx.doi.org/10.1186/s12859-023-05202-6
work_keys_str_mv AT carrypatrickm propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments
AT vigerstim propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments
AT vanderlindenlaurena propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments
AT keetercarson propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments
AT dongfran propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments
AT bucknerteresa propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments
AT litkowskielizabeth propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments
AT yangivana propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments
AT norrisjillm propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments
AT kechriskaterina propensityscoresasanovelmethodtoguidesampleallocationandminimizebatcheffectsduringthedesignofhighthroughputexperiments