Cargando…

Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach

This research is the first empirical attempt to calculate the various components of the hidden bias associated with the sampling strategies routinely-used in human genetics, with special reference to surname-based strategies. We reconstructed surname distributions of 26 Italian communities with diff...

Descripción completa

Detalles Bibliográficos
Autores principales: Risso, Davide, Taglioli, Luca, De Iasio, Sergio, Gueresi, Paola, Alfani, Guido, Nelli, Sergio, Rossi, Paolo, Paoli, Giorgio, Tofanelli, Sergio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4599962/
https://www.ncbi.nlm.nih.gov/pubmed/26452043
http://dx.doi.org/10.1371/journal.pone.0140146
_version_ 1782394354904596480
author Risso, Davide
Taglioli, Luca
De Iasio, Sergio
Gueresi, Paola
Alfani, Guido
Nelli, Sergio
Rossi, Paolo
Paoli, Giorgio
Tofanelli, Sergio
author_facet Risso, Davide
Taglioli, Luca
De Iasio, Sergio
Gueresi, Paola
Alfani, Guido
Nelli, Sergio
Rossi, Paolo
Paoli, Giorgio
Tofanelli, Sergio
author_sort Risso, Davide
collection PubMed
description This research is the first empirical attempt to calculate the various components of the hidden bias associated with the sampling strategies routinely-used in human genetics, with special reference to surname-based strategies. We reconstructed surname distributions of 26 Italian communities with different demographic features across the last six centuries (years 1447–2001). The degree of overlapping between "reference founding core" distributions and the distributions obtained from sampling the present day communities by probabilistic and selective methods was quantified under different conditions and models. When taking into account only one individual per surname (low kinship model), the average discrepancy was 59.5%, with a peak of 84% by random sampling. When multiple individuals per surname were considered (high kinship model), the discrepancy decreased by 8–30% at the cost of a larger variance. Criteria aimed at maximizing locally-spread patrilineages and long-term residency appeared to be affected by recent gene flows much more than expected. Selection of the more frequent family names following low kinship criteria proved to be a suitable approach only for historically stable communities. In any other case true random sampling, despite its high variance, did not return more biased estimates than other selective methods. Our results indicate that the sampling of individuals bearing historically documented surnames (founders' method) should be applied, especially when studying the male-specific genome, to prevent an over-stratification of ancient and recent genetic components that heavily biases inferences and statistics.
format Online
Article
Text
id pubmed-4599962
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-45999622015-10-20 Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach Risso, Davide Taglioli, Luca De Iasio, Sergio Gueresi, Paola Alfani, Guido Nelli, Sergio Rossi, Paolo Paoli, Giorgio Tofanelli, Sergio PLoS One Research Article This research is the first empirical attempt to calculate the various components of the hidden bias associated with the sampling strategies routinely-used in human genetics, with special reference to surname-based strategies. We reconstructed surname distributions of 26 Italian communities with different demographic features across the last six centuries (years 1447–2001). The degree of overlapping between "reference founding core" distributions and the distributions obtained from sampling the present day communities by probabilistic and selective methods was quantified under different conditions and models. When taking into account only one individual per surname (low kinship model), the average discrepancy was 59.5%, with a peak of 84% by random sampling. When multiple individuals per surname were considered (high kinship model), the discrepancy decreased by 8–30% at the cost of a larger variance. Criteria aimed at maximizing locally-spread patrilineages and long-term residency appeared to be affected by recent gene flows much more than expected. Selection of the more frequent family names following low kinship criteria proved to be a suitable approach only for historically stable communities. In any other case true random sampling, despite its high variance, did not return more biased estimates than other selective methods. Our results indicate that the sampling of individuals bearing historically documented surnames (founders' method) should be applied, especially when studying the male-specific genome, to prevent an over-stratification of ancient and recent genetic components that heavily biases inferences and statistics. Public Library of Science 2015-10-09 /pmc/articles/PMC4599962/ /pubmed/26452043 http://dx.doi.org/10.1371/journal.pone.0140146 Text en © 2015 Risso et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Risso, Davide
Taglioli, Luca
De Iasio, Sergio
Gueresi, Paola
Alfani, Guido
Nelli, Sergio
Rossi, Paolo
Paoli, Giorgio
Tofanelli, Sergio
Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach
title Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach
title_full Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach
title_fullStr Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach
title_full_unstemmed Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach
title_short Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach
title_sort estimating sampling selection bias in human genetics: a phenomenological approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4599962/
https://www.ncbi.nlm.nih.gov/pubmed/26452043
http://dx.doi.org/10.1371/journal.pone.0140146
work_keys_str_mv AT rissodavide estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach
AT taglioliluca estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach
AT deiasiosergio estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach
AT gueresipaola estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach
AT alfaniguido estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach
AT nellisergio estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach
AT rossipaolo estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach
AT paoligiorgio estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach
AT tofanellisergio estimatingsamplingselectionbiasinhumangeneticsaphenomenologicalapproach