Cargando…

Non-random sampling leads to biased estimates of transcriptome association

Integration of independent data resources across -omics platforms offers transformative opportunity for novel clinical and biological discoveries. However, application of emerging analytic methods in the context of selection bias represents a noteworthy and pervasive challenge. We hypothesize that c...

Descripción completa

Detalles Bibliográficos
Autores principales: Foulkes, A. S., Balasubramanian, R., Qian, J., Reilly, M. P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148323/
https://www.ncbi.nlm.nih.gov/pubmed/32277087
http://dx.doi.org/10.1038/s41598-020-62575-x
_version_ 1783520570682376192
author Foulkes, A. S.
Balasubramanian, R.
Qian, J.
Reilly, M. P.
author_facet Foulkes, A. S.
Balasubramanian, R.
Qian, J.
Reilly, M. P.
author_sort Foulkes, A. S.
collection PubMed
description Integration of independent data resources across -omics platforms offers transformative opportunity for novel clinical and biological discoveries. However, application of emerging analytic methods in the context of selection bias represents a noteworthy and pervasive challenge. We hypothesize that combining differentially selected samples for integrated transcriptome analysis will lead to bias in the estimated association between predicted expression and the trait. Our results are based on in silico investigations and a case example focused on body mass index across four well-described cohorts apparently derived from markedly different populations. Our findings suggest that integrative analysis can lead to substantial relative bias in the estimate of association between predicted expression and the trait. The average estimate of association ranged from 51.3% less than to 96.7% greater than the true value for the biased sampling scenarios considered, while the average error was − 2.7% for the unbiased scenario. The corresponding 95% confidence interval coverage rate ranged from 46.4% to 69.5% under biased sampling, and was equal to 75% for the unbiased scenario. Inverse probability weighting with observed and estimated weights is applied as one corrective measure and appears to reduce the bias and improve coverage. These results highlight a critical need to address selection bias in integrative analysis and to use caution in interpreting findings in the presence of different sampling mechanisms between groups.
format Online
Article
Text
id pubmed-7148323
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-71483232020-04-15 Non-random sampling leads to biased estimates of transcriptome association Foulkes, A. S. Balasubramanian, R. Qian, J. Reilly, M. P. Sci Rep Article Integration of independent data resources across -omics platforms offers transformative opportunity for novel clinical and biological discoveries. However, application of emerging analytic methods in the context of selection bias represents a noteworthy and pervasive challenge. We hypothesize that combining differentially selected samples for integrated transcriptome analysis will lead to bias in the estimated association between predicted expression and the trait. Our results are based on in silico investigations and a case example focused on body mass index across four well-described cohorts apparently derived from markedly different populations. Our findings suggest that integrative analysis can lead to substantial relative bias in the estimate of association between predicted expression and the trait. The average estimate of association ranged from 51.3% less than to 96.7% greater than the true value for the biased sampling scenarios considered, while the average error was − 2.7% for the unbiased scenario. The corresponding 95% confidence interval coverage rate ranged from 46.4% to 69.5% under biased sampling, and was equal to 75% for the unbiased scenario. Inverse probability weighting with observed and estimated weights is applied as one corrective measure and appears to reduce the bias and improve coverage. These results highlight a critical need to address selection bias in integrative analysis and to use caution in interpreting findings in the presence of different sampling mechanisms between groups. Nature Publishing Group UK 2020-04-10 /pmc/articles/PMC7148323/ /pubmed/32277087 http://dx.doi.org/10.1038/s41598-020-62575-x Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Foulkes, A. S.
Balasubramanian, R.
Qian, J.
Reilly, M. P.
Non-random sampling leads to biased estimates of transcriptome association
title Non-random sampling leads to biased estimates of transcriptome association
title_full Non-random sampling leads to biased estimates of transcriptome association
title_fullStr Non-random sampling leads to biased estimates of transcriptome association
title_full_unstemmed Non-random sampling leads to biased estimates of transcriptome association
title_short Non-random sampling leads to biased estimates of transcriptome association
title_sort non-random sampling leads to biased estimates of transcriptome association
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148323/
https://www.ncbi.nlm.nih.gov/pubmed/32277087
http://dx.doi.org/10.1038/s41598-020-62575-x
work_keys_str_mv AT foulkesas nonrandomsamplingleadstobiasedestimatesoftranscriptomeassociation
AT balasubramanianr nonrandomsamplingleadstobiasedestimatesoftranscriptomeassociation
AT qianj nonrandomsamplingleadstobiasedestimatesoftranscriptomeassociation
AT reillymp nonrandomsamplingleadstobiasedestimatesoftranscriptomeassociation