Cargando…
Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program
Measures of allele and haplotype diversity, which are fundamental properties in population genetics, often follow heavy tailed distributions. These measures are of particular interest in the field of hematopoietic stem cell transplant (HSCT). Donor/Recipient suitability for HSCT is determined by Hum...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4406525/ https://www.ncbi.nlm.nih.gov/pubmed/25901749 http://dx.doi.org/10.1371/journal.pcbi.1004204 |
_version_ | 1782367778858074112 |
---|---|
author | Slater, Noa Louzoun, Yoram Gragert, Loren Maiers, Martin Chatterjee, Ansu Albrecht, Mark |
author_facet | Slater, Noa Louzoun, Yoram Gragert, Loren Maiers, Martin Chatterjee, Ansu Albrecht, Mark |
author_sort | Slater, Noa |
collection | PubMed |
description | Measures of allele and haplotype diversity, which are fundamental properties in population genetics, often follow heavy tailed distributions. These measures are of particular interest in the field of hematopoietic stem cell transplant (HSCT). Donor/Recipient suitability for HSCT is determined by Human Leukocyte Antigen (HLA) similarity. Match predictions rely upon a precise description of HLA diversity, yet classical estimates are inaccurate given the heavy-tailed nature of the distribution. This directly affects HSCT matching and diversity measures in broader fields such as species richness. We, therefore, have developed a power-law based estimator to measure allele and haplotype diversity that accommodates heavy tails using the concepts of regular variation and occupancy distributions. Application of our estimator to 6.59 million donors in the Be The Match Registry revealed that haplotypes follow a heavy tail distribution across all ethnicities: for example, 44.65% of the European American haplotypes are represented by only 1 individual. Indeed, our discovery rate of all U.S. European American haplotypes is estimated at 23.45% based upon sampling 3.97% of the population, leaving a large number of unobserved haplotypes. Population coverage, however, is much higher at 99.4% given that 90% of European Americans carry one of the 4.5% most frequent haplotypes. Alleles were found to be less diverse suggesting the current registry represents most alleles in the population. Thus, for HSCT registries, haplotype discovery will remain high with continued recruitment to a very deep level of sampling, but population coverage will not. Finally, we compared the convergence of our power-law versus classical diversity estimators such as Capture recapture, Chao, ACE and Jackknife methods. When fit to the haplotype data, our estimator displayed favorable properties in terms of convergence (with respect to sampling depth) and accuracy (with respect to diversity estimates). This suggests that power-law based estimators offer a valid alternative to classical diversity estimators and may have broad applicability in the field of population genetics. |
format | Online Article Text |
id | pubmed-4406525 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-44065252015-05-07 Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program Slater, Noa Louzoun, Yoram Gragert, Loren Maiers, Martin Chatterjee, Ansu Albrecht, Mark PLoS Comput Biol Research Article Measures of allele and haplotype diversity, which are fundamental properties in population genetics, often follow heavy tailed distributions. These measures are of particular interest in the field of hematopoietic stem cell transplant (HSCT). Donor/Recipient suitability for HSCT is determined by Human Leukocyte Antigen (HLA) similarity. Match predictions rely upon a precise description of HLA diversity, yet classical estimates are inaccurate given the heavy-tailed nature of the distribution. This directly affects HSCT matching and diversity measures in broader fields such as species richness. We, therefore, have developed a power-law based estimator to measure allele and haplotype diversity that accommodates heavy tails using the concepts of regular variation and occupancy distributions. Application of our estimator to 6.59 million donors in the Be The Match Registry revealed that haplotypes follow a heavy tail distribution across all ethnicities: for example, 44.65% of the European American haplotypes are represented by only 1 individual. Indeed, our discovery rate of all U.S. European American haplotypes is estimated at 23.45% based upon sampling 3.97% of the population, leaving a large number of unobserved haplotypes. Population coverage, however, is much higher at 99.4% given that 90% of European Americans carry one of the 4.5% most frequent haplotypes. Alleles were found to be less diverse suggesting the current registry represents most alleles in the population. Thus, for HSCT registries, haplotype discovery will remain high with continued recruitment to a very deep level of sampling, but population coverage will not. Finally, we compared the convergence of our power-law versus classical diversity estimators such as Capture recapture, Chao, ACE and Jackknife methods. When fit to the haplotype data, our estimator displayed favorable properties in terms of convergence (with respect to sampling depth) and accuracy (with respect to diversity estimates). This suggests that power-law based estimators offer a valid alternative to classical diversity estimators and may have broad applicability in the field of population genetics. Public Library of Science 2015-04-22 /pmc/articles/PMC4406525/ /pubmed/25901749 http://dx.doi.org/10.1371/journal.pcbi.1004204 Text en © 2015 Slater et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Slater, Noa Louzoun, Yoram Gragert, Loren Maiers, Martin Chatterjee, Ansu Albrecht, Mark Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program |
title | Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program |
title_full | Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program |
title_fullStr | Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program |
title_full_unstemmed | Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program |
title_short | Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program |
title_sort | power laws for heavy-tailed distributions: modeling allele and haplotype diversity for the national marrow donor program |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4406525/ https://www.ncbi.nlm.nih.gov/pubmed/25901749 http://dx.doi.org/10.1371/journal.pcbi.1004204 |
work_keys_str_mv | AT slaternoa powerlawsforheavytaileddistributionsmodelingalleleandhaplotypediversityforthenationalmarrowdonorprogram AT louzounyoram powerlawsforheavytaileddistributionsmodelingalleleandhaplotypediversityforthenationalmarrowdonorprogram AT gragertloren powerlawsforheavytaileddistributionsmodelingalleleandhaplotypediversityforthenationalmarrowdonorprogram AT maiersmartin powerlawsforheavytaileddistributionsmodelingalleleandhaplotypediversityforthenationalmarrowdonorprogram AT chatterjeeansu powerlawsforheavytaileddistributionsmodelingalleleandhaplotypediversityforthenationalmarrowdonorprogram AT albrechtmark powerlawsforheavytaileddistributionsmodelingalleleandhaplotypediversityforthenationalmarrowdonorprogram |