Cargando…
How and how much does RAD-seq bias genetic diversity estimates?
BACKGROUND: RAD-seq is a powerful tool, increasingly used in population genomics. However, earlier studies have raised red flags regarding possible biases associated with this technique. In particular, polymorphism on restriction sites results in preferential sampling of closely related haplotypes,...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100275/ https://www.ncbi.nlm.nih.gov/pubmed/27825303 http://dx.doi.org/10.1186/s12862-016-0791-0 |
_version_ | 1782466109270654976 |
---|---|
author | Cariou, Marie Duret, Laurent Charlat, Sylvain |
author_facet | Cariou, Marie Duret, Laurent Charlat, Sylvain |
author_sort | Cariou, Marie |
collection | PubMed |
description | BACKGROUND: RAD-seq is a powerful tool, increasingly used in population genomics. However, earlier studies have raised red flags regarding possible biases associated with this technique. In particular, polymorphism on restriction sites results in preferential sampling of closely related haplotypes, so that RAD data tends to underestimate genetic diversity. RESULTS: Here we (1) clarify the theoretical basis of this bias, highlighting the potential confounding effects of population structure and selection, (2) confront predictions to real data from in silico digestion of full genomes and (3) provide a proof of concept toward an ABC-based correction of the RAD-seq bias. Under a neutral and panmictic model, we confirm the previously established relationship between the true polymorphism and its RAD-based estimation, showing a more pronounced bias when polymorphism is high. Using more elaborate models, we show that selection, resulting in heterogeneous levels of polymorphism along the genome, exacerbates the bias and leads to a more pronounced underestimation. On the contrary, spatial genetic structure tends to reduce the bias. We confront the neutral and panmictic model to “ideal” empirical data (in silico RAD-sequencing) using full genomes from natural populations of the fruit fly Drosophila melanogaster and the fungus Shizophyllum commune, harbouring respectively moderate and high genetic diversity. In D. melanogaster, predictions fit the model, but the small difference between the true and RAD polymorphism makes this comparison insensitive to deviations from the model. In the highly polymorphic fungus, the model captures a large part of the bias but makes inaccurate predictions. Accordingly, ABC corrections based on this model improve the estimations, albeit with some imprecisions. CONCLUSION: The RAD-seq underestimation of genetic diversity associated with polymorphism in restriction sites becomes more pronounced when polymorphism is high. In practice, this means that in many systems where polymorphism does not exceed 2 %, the bias is of minor importance in the face of other sources of uncertainty, such as heterogeneous bases composition or technical artefacts. The neutral panmictic model provides a practical mean to correct the bias through ABC, albeit with some imprecisions. More elaborate ABC methods might integrate additional parameters, such as population structure and selection, but their opposite effects could hinder accurate corrections. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12862-016-0791-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5100275 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-51002752016-11-08 How and how much does RAD-seq bias genetic diversity estimates? Cariou, Marie Duret, Laurent Charlat, Sylvain BMC Evol Biol Research Article BACKGROUND: RAD-seq is a powerful tool, increasingly used in population genomics. However, earlier studies have raised red flags regarding possible biases associated with this technique. In particular, polymorphism on restriction sites results in preferential sampling of closely related haplotypes, so that RAD data tends to underestimate genetic diversity. RESULTS: Here we (1) clarify the theoretical basis of this bias, highlighting the potential confounding effects of population structure and selection, (2) confront predictions to real data from in silico digestion of full genomes and (3) provide a proof of concept toward an ABC-based correction of the RAD-seq bias. Under a neutral and panmictic model, we confirm the previously established relationship between the true polymorphism and its RAD-based estimation, showing a more pronounced bias when polymorphism is high. Using more elaborate models, we show that selection, resulting in heterogeneous levels of polymorphism along the genome, exacerbates the bias and leads to a more pronounced underestimation. On the contrary, spatial genetic structure tends to reduce the bias. We confront the neutral and panmictic model to “ideal” empirical data (in silico RAD-sequencing) using full genomes from natural populations of the fruit fly Drosophila melanogaster and the fungus Shizophyllum commune, harbouring respectively moderate and high genetic diversity. In D. melanogaster, predictions fit the model, but the small difference between the true and RAD polymorphism makes this comparison insensitive to deviations from the model. In the highly polymorphic fungus, the model captures a large part of the bias but makes inaccurate predictions. Accordingly, ABC corrections based on this model improve the estimations, albeit with some imprecisions. CONCLUSION: The RAD-seq underestimation of genetic diversity associated with polymorphism in restriction sites becomes more pronounced when polymorphism is high. In practice, this means that in many systems where polymorphism does not exceed 2 %, the bias is of minor importance in the face of other sources of uncertainty, such as heterogeneous bases composition or technical artefacts. The neutral panmictic model provides a practical mean to correct the bias through ABC, albeit with some imprecisions. More elaborate ABC methods might integrate additional parameters, such as population structure and selection, but their opposite effects could hinder accurate corrections. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12862-016-0791-0) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-08 /pmc/articles/PMC5100275/ /pubmed/27825303 http://dx.doi.org/10.1186/s12862-016-0791-0 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Cariou, Marie Duret, Laurent Charlat, Sylvain How and how much does RAD-seq bias genetic diversity estimates? |
title | How and how much does RAD-seq bias genetic diversity estimates? |
title_full | How and how much does RAD-seq bias genetic diversity estimates? |
title_fullStr | How and how much does RAD-seq bias genetic diversity estimates? |
title_full_unstemmed | How and how much does RAD-seq bias genetic diversity estimates? |
title_short | How and how much does RAD-seq bias genetic diversity estimates? |
title_sort | how and how much does rad-seq bias genetic diversity estimates? |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100275/ https://www.ncbi.nlm.nih.gov/pubmed/27825303 http://dx.doi.org/10.1186/s12862-016-0791-0 |
work_keys_str_mv | AT carioumarie howandhowmuchdoesradseqbiasgeneticdiversityestimates AT duretlaurent howandhowmuchdoesradseqbiasgeneticdiversityestimates AT charlatsylvain howandhowmuchdoesradseqbiasgeneticdiversityestimates |