Cargando…

On the Bias of Precision Estimation Under Separate Sampling

Observational case-control studies for biomarker discovery in cancer studies often collect data that are sampled separately from the case and control populations. We present an analysis of the bias in the estimation of the precision of classifiers designed on separately sampled data. The analysis co...

Descripción completa

Detalles Bibliográficos
Autores principales: Xie, Shuilian, Braga-Neto, Ulisses M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6636226/
https://www.ncbi.nlm.nih.gov/pubmed/31360060
http://dx.doi.org/10.1177/1176935119860822
_version_ 1783436031027052544
author Xie, Shuilian
Braga-Neto, Ulisses M
author_facet Xie, Shuilian
Braga-Neto, Ulisses M
author_sort Xie, Shuilian
collection PubMed
description Observational case-control studies for biomarker discovery in cancer studies often collect data that are sampled separately from the case and control populations. We present an analysis of the bias in the estimation of the precision of classifiers designed on separately sampled data. The analysis consists of both theoretical and numerical results, which show that classifier precision estimates can display strong bias under separating sampling, with the bias magnitude depending on the difference between the true case prevalence in the population and the sample prevalence in the data. We show that this bias is systematic in the sense that it cannot be reduced by increasing sample size. If information about the true case prevalence is available from public health records, then a modified precision estimator that uses the known prevalence displays smaller bias, which can in fact be reduced to zero as sample size increases under regularity conditions on the classification algorithm. The accuracy of the theoretical analysis and the performance of the precision estimators under separate sampling are confirmed by numerical experiments using synthetic and real data from published observational case-control studies. The results with real data confirmed that under separately sampled data, the usual estimator produces larger, ie, more optimistic, precision estimates than the estimator using the true prevalence value.
format Online
Article
Text
id pubmed-6636226
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-66362262019-07-29 On the Bias of Precision Estimation Under Separate Sampling Xie, Shuilian Braga-Neto, Ulisses M Cancer Inform Methodology Observational case-control studies for biomarker discovery in cancer studies often collect data that are sampled separately from the case and control populations. We present an analysis of the bias in the estimation of the precision of classifiers designed on separately sampled data. The analysis consists of both theoretical and numerical results, which show that classifier precision estimates can display strong bias under separating sampling, with the bias magnitude depending on the difference between the true case prevalence in the population and the sample prevalence in the data. We show that this bias is systematic in the sense that it cannot be reduced by increasing sample size. If information about the true case prevalence is available from public health records, then a modified precision estimator that uses the known prevalence displays smaller bias, which can in fact be reduced to zero as sample size increases under regularity conditions on the classification algorithm. The accuracy of the theoretical analysis and the performance of the precision estimators under separate sampling are confirmed by numerical experiments using synthetic and real data from published observational case-control studies. The results with real data confirmed that under separately sampled data, the usual estimator produces larger, ie, more optimistic, precision estimates than the estimator using the true prevalence value. SAGE Publications 2019-07-15 /pmc/articles/PMC6636226/ /pubmed/31360060 http://dx.doi.org/10.1177/1176935119860822 Text en © The Author(s) 2019 http://www.creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Methodology
Xie, Shuilian
Braga-Neto, Ulisses M
On the Bias of Precision Estimation Under Separate Sampling
title On the Bias of Precision Estimation Under Separate Sampling
title_full On the Bias of Precision Estimation Under Separate Sampling
title_fullStr On the Bias of Precision Estimation Under Separate Sampling
title_full_unstemmed On the Bias of Precision Estimation Under Separate Sampling
title_short On the Bias of Precision Estimation Under Separate Sampling
title_sort on the bias of precision estimation under separate sampling
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6636226/
https://www.ncbi.nlm.nih.gov/pubmed/31360060
http://dx.doi.org/10.1177/1176935119860822
work_keys_str_mv AT xieshuilian onthebiasofprecisionestimationunderseparatesampling
AT braganetoulissesm onthebiasofprecisionestimationunderseparatesampling