Cargando…
Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach
This research is the first empirical attempt to calculate the various components of the hidden bias associated with the sampling strategies routinely-used in human genetics, with special reference to surname-based strategies. We reconstructed surname distributions of 26 Italian communities with diff...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4599962/ https://www.ncbi.nlm.nih.gov/pubmed/26452043 http://dx.doi.org/10.1371/journal.pone.0140146 |
_version_ | 1782394354904596480 |
---|---|
author | Risso, Davide Taglioli, Luca De Iasio, Sergio Gueresi, Paola Alfani, Guido Nelli, Sergio Rossi, Paolo Paoli, Giorgio Tofanelli, Sergio |
author_facet | Risso, Davide Taglioli, Luca De Iasio, Sergio Gueresi, Paola Alfani, Guido Nelli, Sergio Rossi, Paolo Paoli, Giorgio Tofanelli, Sergio |
author_sort | Risso, Davide |
collection | PubMed |
description | This research is the first empirical attempt to calculate the various components of the hidden bias associated with the sampling strategies routinely-used in human genetics, with special reference to surname-based strategies. We reconstructed surname distributions of 26 Italian communities with different demographic features across the last six centuries (years 1447–2001). The degree of overlapping between "reference founding core" distributions and the distributions obtained from sampling the present day communities by probabilistic and selective methods was quantified under different conditions and models. When taking into account only one individual per surname (low kinship model), the average discrepancy was 59.5%, with a peak of 84% by random sampling. When multiple individuals per surname were considered (high kinship model), the discrepancy decreased by 8–30% at the cost of a larger variance. Criteria aimed at maximizing locally-spread patrilineages and long-term residency appeared to be affected by recent gene flows much more than expected. Selection of the more frequent family names following low kinship criteria proved to be a suitable approach only for historically stable communities. In any other case true random sampling, despite its high variance, did not return more biased estimates than other selective methods. Our results indicate that the sampling of individuals bearing historically documented surnames (founders' method) should be applied, especially when studying the male-specific genome, to prevent an over-stratification of ancient and recent genetic components that heavily biases inferences and statistics. |
format | Online Article Text |
id | pubmed-4599962 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-45999622015-10-20 Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach Risso, Davide Taglioli, Luca De Iasio, Sergio Gueresi, Paola Alfani, Guido Nelli, Sergio Rossi, Paolo Paoli, Giorgio Tofanelli, Sergio PLoS One Research Article This research is the first empirical attempt to calculate the various components of the hidden bias associated with the sampling strategies routinely-used in human genetics, with special reference to surname-based strategies. We reconstructed surname distributions of 26 Italian communities with different demographic features across the last six centuries (years 1447–2001). The degree of overlapping between "reference founding core" distributions and the distributions obtained from sampling the present day communities by probabilistic and selective methods was quantified under different conditions and models. When taking into account only one individual per surname (low kinship model), the average discrepancy was 59.5%, with a peak of 84% by random sampling. When multiple individuals per surname were considered (high kinship model), the discrepancy decreased by 8–30% at the cost of a larger variance. Criteria aimed at maximizing locally-spread patrilineages and long-term residency appeared to be affected by recent gene flows much more than expected. Selection of the more frequent family names following low kinship criteria proved to be a suitable approach only for historically stable communities. In any other case true random sampling, despite its high variance, did not return more biased estimates than other selective methods. Our results indicate that the sampling of individuals bearing historically documented surnames (founders' method) should be applied, especially when studying the male-specific genome, to prevent an over-stratification of ancient and recent genetic components that heavily biases inferences and statistics. Public Library of Science 2015-10-09 /pmc/articles/PMC4599962/ /pubmed/26452043 http://dx.doi.org/10.1371/journal.pone.0140146 Text en © 2015 Risso et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Risso, Davide Taglioli, Luca De Iasio, Sergio Gueresi, Paola Alfani, Guido Nelli, Sergio Rossi, Paolo Paoli, Giorgio Tofanelli, Sergio Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach |
title | Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach |
title_full | Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach |
title_fullStr | Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach |
title_full_unstemmed | Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach |
title_short | Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach |
title_sort | estimating sampling selection bias in human genetics: a phenomenological approach |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4599962/ https://www.ncbi.nlm.nih.gov/pubmed/26452043 http://dx.doi.org/10.1371/journal.pone.0140146 |
work_keys_str_mv | AT rissodavide estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach AT taglioliluca estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach AT deiasiosergio estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach AT gueresipaola estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach AT alfaniguido estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach AT nellisergio estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach AT rossipaolo estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach AT paoligiorgio estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach AT tofanellisergio estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach |