Cargando…

Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program

Measures of allele and haplotype diversity, which are fundamental properties in population genetics, often follow heavy tailed distributions. These measures are of particular interest in the field of hematopoietic stem cell transplant (HSCT). Donor/Recipient suitability for HSCT is determined by Hum...

Descripción completa

Detalles Bibliográficos
Autores principales: Slater, Noa, Louzoun, Yoram, Gragert, Loren, Maiers, Martin, Chatterjee, Ansu, Albrecht, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4406525/
https://www.ncbi.nlm.nih.gov/pubmed/25901749
http://dx.doi.org/10.1371/journal.pcbi.1004204
_version_ 1782367778858074112
author Slater, Noa
Louzoun, Yoram
Gragert, Loren
Maiers, Martin
Chatterjee, Ansu
Albrecht, Mark
author_facet Slater, Noa
Louzoun, Yoram
Gragert, Loren
Maiers, Martin
Chatterjee, Ansu
Albrecht, Mark
author_sort Slater, Noa
collection PubMed
description Measures of allele and haplotype diversity, which are fundamental properties in population genetics, often follow heavy tailed distributions. These measures are of particular interest in the field of hematopoietic stem cell transplant (HSCT). Donor/Recipient suitability for HSCT is determined by Human Leukocyte Antigen (HLA) similarity. Match predictions rely upon a precise description of HLA diversity, yet classical estimates are inaccurate given the heavy-tailed nature of the distribution. This directly affects HSCT matching and diversity measures in broader fields such as species richness. We, therefore, have developed a power-law based estimator to measure allele and haplotype diversity that accommodates heavy tails using the concepts of regular variation and occupancy distributions. Application of our estimator to 6.59 million donors in the Be The Match Registry revealed that haplotypes follow a heavy tail distribution across all ethnicities: for example, 44.65% of the European American haplotypes are represented by only 1 individual. Indeed, our discovery rate of all U.S. European American haplotypes is estimated at 23.45% based upon sampling 3.97% of the population, leaving a large number of unobserved haplotypes. Population coverage, however, is much higher at 99.4% given that 90% of European Americans carry one of the 4.5% most frequent haplotypes. Alleles were found to be less diverse suggesting the current registry represents most alleles in the population. Thus, for HSCT registries, haplotype discovery will remain high with continued recruitment to a very deep level of sampling, but population coverage will not. Finally, we compared the convergence of our power-law versus classical diversity estimators such as Capture recapture, Chao, ACE and Jackknife methods. When fit to the haplotype data, our estimator displayed favorable properties in terms of convergence (with respect to sampling depth) and accuracy (with respect to diversity estimates). This suggests that power-law based estimators offer a valid alternative to classical diversity estimators and may have broad applicability in the field of population genetics.
format Online
Article
Text
id pubmed-4406525
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44065252015-05-07 Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program Slater, Noa Louzoun, Yoram Gragert, Loren Maiers, Martin Chatterjee, Ansu Albrecht, Mark PLoS Comput Biol Research Article Measures of allele and haplotype diversity, which are fundamental properties in population genetics, often follow heavy tailed distributions. These measures are of particular interest in the field of hematopoietic stem cell transplant (HSCT). Donor/Recipient suitability for HSCT is determined by Human Leukocyte Antigen (HLA) similarity. Match predictions rely upon a precise description of HLA diversity, yet classical estimates are inaccurate given the heavy-tailed nature of the distribution. This directly affects HSCT matching and diversity measures in broader fields such as species richness. We, therefore, have developed a power-law based estimator to measure allele and haplotype diversity that accommodates heavy tails using the concepts of regular variation and occupancy distributions. Application of our estimator to 6.59 million donors in the Be The Match Registry revealed that haplotypes follow a heavy tail distribution across all ethnicities: for example, 44.65% of the European American haplotypes are represented by only 1 individual. Indeed, our discovery rate of all U.S. European American haplotypes is estimated at 23.45% based upon sampling 3.97% of the population, leaving a large number of unobserved haplotypes. Population coverage, however, is much higher at 99.4% given that 90% of European Americans carry one of the 4.5% most frequent haplotypes. Alleles were found to be less diverse suggesting the current registry represents most alleles in the population. Thus, for HSCT registries, haplotype discovery will remain high with continued recruitment to a very deep level of sampling, but population coverage will not. Finally, we compared the convergence of our power-law versus classical diversity estimators such as Capture recapture, Chao, ACE and Jackknife methods. When fit to the haplotype data, our estimator displayed favorable properties in terms of convergence (with respect to sampling depth) and accuracy (with respect to diversity estimates). This suggests that power-law based estimators offer a valid alternative to classical diversity estimators and may have broad applicability in the field of population genetics. Public Library of Science 2015-04-22 /pmc/articles/PMC4406525/ /pubmed/25901749 http://dx.doi.org/10.1371/journal.pcbi.1004204 Text en © 2015 Slater et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Slater, Noa
Louzoun, Yoram
Gragert, Loren
Maiers, Martin
Chatterjee, Ansu
Albrecht, Mark
Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program
title Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program
title_full Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program
title_fullStr Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program
title_full_unstemmed Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program
title_short Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program
title_sort power laws for heavy-tailed distributions: modeling allele and haplotype diversity for the national marrow donor program
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4406525/
https://www.ncbi.nlm.nih.gov/pubmed/25901749
http://dx.doi.org/10.1371/journal.pcbi.1004204
work_keys_str_mv AT slaternoa powerlawsforheavytaileddistributionsmodelingalleleandhaplotypediversityforthenationalmarrowdonorprogram
AT louzounyoram powerlawsforheavytaileddistributionsmodelingalleleandhaplotypediversityforthenationalmarrowdonorprogram
AT gragertloren powerlawsforheavytaileddistributionsmodelingalleleandhaplotypediversityforthenationalmarrowdonorprogram
AT maiersmartin powerlawsforheavytaileddistributionsmodelingalleleandhaplotypediversityforthenationalmarrowdonorprogram
AT chatterjeeansu powerlawsforheavytaileddistributionsmodelingalleleandhaplotypediversityforthenationalmarrowdonorprogram
AT albrechtmark powerlawsforheavytaileddistributionsmodelingalleleandhaplotypediversityforthenationalmarrowdonorprogram