Cargando…

How and how much does RAD-seq bias genetic diversity estimates?

BACKGROUND: RAD-seq is a powerful tool, increasingly used in population genomics. However, earlier studies have raised red flags regarding possible biases associated with this technique. In particular, polymorphism on restriction sites results in preferential sampling of closely related haplotypes,...

Descripción completa

Detalles Bibliográficos
Autores principales: Cariou, Marie, Duret, Laurent, Charlat, Sylvain
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100275/
https://www.ncbi.nlm.nih.gov/pubmed/27825303
http://dx.doi.org/10.1186/s12862-016-0791-0
_version_ 1782466109270654976
author Cariou, Marie
Duret, Laurent
Charlat, Sylvain
author_facet Cariou, Marie
Duret, Laurent
Charlat, Sylvain
author_sort Cariou, Marie
collection PubMed
description BACKGROUND: RAD-seq is a powerful tool, increasingly used in population genomics. However, earlier studies have raised red flags regarding possible biases associated with this technique. In particular, polymorphism on restriction sites results in preferential sampling of closely related haplotypes, so that RAD data tends to underestimate genetic diversity. RESULTS: Here we (1) clarify the theoretical basis of this bias, highlighting the potential confounding effects of population structure and selection, (2) confront predictions to real data from in silico digestion of full genomes and (3) provide a proof of concept toward an ABC-based correction of the RAD-seq bias. Under a neutral and panmictic model, we confirm the previously established relationship between the true polymorphism and its RAD-based estimation, showing a more pronounced bias when polymorphism is high. Using more elaborate models, we show that selection, resulting in heterogeneous levels of polymorphism along the genome, exacerbates the bias and leads to a more pronounced underestimation. On the contrary, spatial genetic structure tends to reduce the bias. We confront the neutral and panmictic model to “ideal” empirical data (in silico RAD-sequencing) using full genomes from natural populations of the fruit fly Drosophila melanogaster and the fungus Shizophyllum commune, harbouring respectively moderate and high genetic diversity. In D. melanogaster, predictions fit the model, but the small difference between the true and RAD polymorphism makes this comparison insensitive to deviations from the model. In the highly polymorphic fungus, the model captures a large part of the bias but makes inaccurate predictions. Accordingly, ABC corrections based on this model improve the estimations, albeit with some imprecisions. CONCLUSION: The RAD-seq underestimation of genetic diversity associated with polymorphism in restriction sites becomes more pronounced when polymorphism is high. In practice, this means that in many systems where polymorphism does not exceed 2 %, the bias is of minor importance in the face of other sources of uncertainty, such as heterogeneous bases composition or technical artefacts. The neutral panmictic model provides a practical mean to correct the bias through ABC, albeit with some imprecisions. More elaborate ABC methods might integrate additional parameters, such as population structure and selection, but their opposite effects could hinder accurate corrections. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12862-016-0791-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5100275
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51002752016-11-08 How and how much does RAD-seq bias genetic diversity estimates? Cariou, Marie Duret, Laurent Charlat, Sylvain BMC Evol Biol Research Article BACKGROUND: RAD-seq is a powerful tool, increasingly used in population genomics. However, earlier studies have raised red flags regarding possible biases associated with this technique. In particular, polymorphism on restriction sites results in preferential sampling of closely related haplotypes, so that RAD data tends to underestimate genetic diversity. RESULTS: Here we (1) clarify the theoretical basis of this bias, highlighting the potential confounding effects of population structure and selection, (2) confront predictions to real data from in silico digestion of full genomes and (3) provide a proof of concept toward an ABC-based correction of the RAD-seq bias. Under a neutral and panmictic model, we confirm the previously established relationship between the true polymorphism and its RAD-based estimation, showing a more pronounced bias when polymorphism is high. Using more elaborate models, we show that selection, resulting in heterogeneous levels of polymorphism along the genome, exacerbates the bias and leads to a more pronounced underestimation. On the contrary, spatial genetic structure tends to reduce the bias. We confront the neutral and panmictic model to “ideal” empirical data (in silico RAD-sequencing) using full genomes from natural populations of the fruit fly Drosophila melanogaster and the fungus Shizophyllum commune, harbouring respectively moderate and high genetic diversity. In D. melanogaster, predictions fit the model, but the small difference between the true and RAD polymorphism makes this comparison insensitive to deviations from the model. In the highly polymorphic fungus, the model captures a large part of the bias but makes inaccurate predictions. Accordingly, ABC corrections based on this model improve the estimations, albeit with some imprecisions. CONCLUSION: The RAD-seq underestimation of genetic diversity associated with polymorphism in restriction sites becomes more pronounced when polymorphism is high. In practice, this means that in many systems where polymorphism does not exceed 2 %, the bias is of minor importance in the face of other sources of uncertainty, such as heterogeneous bases composition or technical artefacts. The neutral panmictic model provides a practical mean to correct the bias through ABC, albeit with some imprecisions. More elaborate ABC methods might integrate additional parameters, such as population structure and selection, but their opposite effects could hinder accurate corrections. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12862-016-0791-0) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-08 /pmc/articles/PMC5100275/ /pubmed/27825303 http://dx.doi.org/10.1186/s12862-016-0791-0 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Cariou, Marie
Duret, Laurent
Charlat, Sylvain
How and how much does RAD-seq bias genetic diversity estimates?
title How and how much does RAD-seq bias genetic diversity estimates?
title_full How and how much does RAD-seq bias genetic diversity estimates?
title_fullStr How and how much does RAD-seq bias genetic diversity estimates?
title_full_unstemmed How and how much does RAD-seq bias genetic diversity estimates?
title_short How and how much does RAD-seq bias genetic diversity estimates?
title_sort how and how much does rad-seq bias genetic diversity estimates?
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100275/
https://www.ncbi.nlm.nih.gov/pubmed/27825303
http://dx.doi.org/10.1186/s12862-016-0791-0
work_keys_str_mv AT carioumarie howandhowmuchdoesradseqbiasgeneticdiversityestimates
AT duretlaurent howandhowmuchdoesradseqbiasgeneticdiversityestimates
AT charlatsylvain howandhowmuchdoesradseqbiasgeneticdiversityestimates