Cargando…

Towards Eliminating Bias in Cluster Analysis of TB Genotyped Data

The relative contributions of transmission and reactivation of latent infection to TB cases observed clinically has been reported in many situations, but always with some uncertainty. Genotyped data from TB organisms obtained from patients have been used as the basis for heuristic distinctions betwe...

Descripción completa

Detalles Bibliográficos
Autores principales: van Schalkwyk, Cari, Cule, Madeleine, Welte, Alex, van Helden, Paul, van der Spuy, Gian, Uys, Pieter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315507/
https://www.ncbi.nlm.nih.gov/pubmed/22479534
http://dx.doi.org/10.1371/journal.pone.0034109
_version_ 1782228244443955200
author van Schalkwyk, Cari
Cule, Madeleine
Welte, Alex
van Helden, Paul
van der Spuy, Gian
Uys, Pieter
author_facet van Schalkwyk, Cari
Cule, Madeleine
Welte, Alex
van Helden, Paul
van der Spuy, Gian
Uys, Pieter
author_sort van Schalkwyk, Cari
collection PubMed
description The relative contributions of transmission and reactivation of latent infection to TB cases observed clinically has been reported in many situations, but always with some uncertainty. Genotyped data from TB organisms obtained from patients have been used as the basis for heuristic distinctions between circulating (clustered strains) and reactivated infections (unclustered strains). Naïve methods previously applied to the analysis of such data are known to provide biased estimates of the proportion of unclustered cases. The hypergeometric distribution, which generates probabilities of observing clusters of a given size as realized clusters of all possible sizes, is analyzed in this paper to yield a formal estimator for genotype cluster sizes. Subtle aspects of numerical stability, bias, and variance are explored. This formal estimator is seen to be stable with respect to the epidemiologically interesting properties of the cluster size distribution (the number of clusters and the number of singletons) though it does not yield satisfactory estimates of the number of clusters of larger sizes. The problem that even complete coverage of genotyping, in a practical sampling frame, will only provide a partial view of the actual transmission network remains to be explored.
format Online
Article
Text
id pubmed-3315507
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-33155072012-04-04 Towards Eliminating Bias in Cluster Analysis of TB Genotyped Data van Schalkwyk, Cari Cule, Madeleine Welte, Alex van Helden, Paul van der Spuy, Gian Uys, Pieter PLoS One Research Article The relative contributions of transmission and reactivation of latent infection to TB cases observed clinically has been reported in many situations, but always with some uncertainty. Genotyped data from TB organisms obtained from patients have been used as the basis for heuristic distinctions between circulating (clustered strains) and reactivated infections (unclustered strains). Naïve methods previously applied to the analysis of such data are known to provide biased estimates of the proportion of unclustered cases. The hypergeometric distribution, which generates probabilities of observing clusters of a given size as realized clusters of all possible sizes, is analyzed in this paper to yield a formal estimator for genotype cluster sizes. Subtle aspects of numerical stability, bias, and variance are explored. This formal estimator is seen to be stable with respect to the epidemiologically interesting properties of the cluster size distribution (the number of clusters and the number of singletons) though it does not yield satisfactory estimates of the number of clusters of larger sizes. The problem that even complete coverage of genotyping, in a practical sampling frame, will only provide a partial view of the actual transmission network remains to be explored. Public Library of Science 2012-03-29 /pmc/articles/PMC3315507/ /pubmed/22479534 http://dx.doi.org/10.1371/journal.pone.0034109 Text en van Schalkwyk et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
van Schalkwyk, Cari
Cule, Madeleine
Welte, Alex
van Helden, Paul
van der Spuy, Gian
Uys, Pieter
Towards Eliminating Bias in Cluster Analysis of TB Genotyped Data
title Towards Eliminating Bias in Cluster Analysis of TB Genotyped Data
title_full Towards Eliminating Bias in Cluster Analysis of TB Genotyped Data
title_fullStr Towards Eliminating Bias in Cluster Analysis of TB Genotyped Data
title_full_unstemmed Towards Eliminating Bias in Cluster Analysis of TB Genotyped Data
title_short Towards Eliminating Bias in Cluster Analysis of TB Genotyped Data
title_sort towards eliminating bias in cluster analysis of tb genotyped data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315507/
https://www.ncbi.nlm.nih.gov/pubmed/22479534
http://dx.doi.org/10.1371/journal.pone.0034109
work_keys_str_mv AT vanschalkwykcari towardseliminatingbiasinclusteranalysisoftbgenotypeddata
AT culemadeleine towardseliminatingbiasinclusteranalysisoftbgenotypeddata
AT weltealex towardseliminatingbiasinclusteranalysisoftbgenotypeddata
AT vanheldenpaul towardseliminatingbiasinclusteranalysisoftbgenotypeddata
AT vanderspuygian towardseliminatingbiasinclusteranalysisoftbgenotypeddata
AT uyspieter towardseliminatingbiasinclusteranalysisoftbgenotypeddata