Cargando…

Selecting RAD-Seq Data Analysis Parameters for Population Genetics: The More the Better?

Restriction site-associated DNA sequencing (RAD-seq) has become a powerful and widely used tool in molecular ecology studies as it allows to cost-effectively recover thousands of polymorphic sites across individuals of non-model organisms. However, its successful implementation in population genetic...

Descripción completa

Detalles Bibliográficos
Autores principales: Díaz-Arce, Natalia, Rodríguez-Ezpeleta, Naiara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6549478/
https://www.ncbi.nlm.nih.gov/pubmed/31191624
http://dx.doi.org/10.3389/fgene.2019.00533
_version_ 1783424015724969984
author Díaz-Arce, Natalia
Rodríguez-Ezpeleta, Naiara
author_facet Díaz-Arce, Natalia
Rodríguez-Ezpeleta, Naiara
author_sort Díaz-Arce, Natalia
collection PubMed
description Restriction site-associated DNA sequencing (RAD-seq) has become a powerful and widely used tool in molecular ecology studies as it allows to cost-effectively recover thousands of polymorphic sites across individuals of non-model organisms. However, its successful implementation in population genetics relies on correct data processing that would minimize potential loci-assembly biases and consequent genotyping error rates. RAD-seq data processing when no reference genome is available involves the assembly of hundreds of thousands high-throughput sequencing reads into orthologous loci, for which various key parameter values need to be selected by the researcher. Previous studies exploring the effect of these parameter values found or assumed that a larger number of recovered polymorphic loci is associated with a better assembly. Here, using three RAD-seq datasets from different species, we explore the effect of read filtering, loci assembly and polymorphic site selection on number of markers obtained and genetic differentiation inferred using the Stacks software. We find (i) that recovery of higher numbers of polymorphic loci is not necessarily associated with higher genetic differentiation, (ii) that the presence of PCR duplicates, selected loci assembly parameters and selected SNP filtering parameters affect the number of recovered polymorphic loci and degree of genetic differentiation, and (iii) that this effect is different in each dataset, meaning that defining a systematic universal protocol for RAD-seq data analysis may lead to missing relevant information about population differentiation.
format Online
Article
Text
id pubmed-6549478
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-65494782019-06-12 Selecting RAD-Seq Data Analysis Parameters for Population Genetics: The More the Better? Díaz-Arce, Natalia Rodríguez-Ezpeleta, Naiara Front Genet Genetics Restriction site-associated DNA sequencing (RAD-seq) has become a powerful and widely used tool in molecular ecology studies as it allows to cost-effectively recover thousands of polymorphic sites across individuals of non-model organisms. However, its successful implementation in population genetics relies on correct data processing that would minimize potential loci-assembly biases and consequent genotyping error rates. RAD-seq data processing when no reference genome is available involves the assembly of hundreds of thousands high-throughput sequencing reads into orthologous loci, for which various key parameter values need to be selected by the researcher. Previous studies exploring the effect of these parameter values found or assumed that a larger number of recovered polymorphic loci is associated with a better assembly. Here, using three RAD-seq datasets from different species, we explore the effect of read filtering, loci assembly and polymorphic site selection on number of markers obtained and genetic differentiation inferred using the Stacks software. We find (i) that recovery of higher numbers of polymorphic loci is not necessarily associated with higher genetic differentiation, (ii) that the presence of PCR duplicates, selected loci assembly parameters and selected SNP filtering parameters affect the number of recovered polymorphic loci and degree of genetic differentiation, and (iii) that this effect is different in each dataset, meaning that defining a systematic universal protocol for RAD-seq data analysis may lead to missing relevant information about population differentiation. Frontiers Media S.A. 2019-05-29 /pmc/articles/PMC6549478/ /pubmed/31191624 http://dx.doi.org/10.3389/fgene.2019.00533 Text en Copyright © 2019 Díaz-Arce and Rodríguez-Ezpeleta. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Díaz-Arce, Natalia
Rodríguez-Ezpeleta, Naiara
Selecting RAD-Seq Data Analysis Parameters for Population Genetics: The More the Better?
title Selecting RAD-Seq Data Analysis Parameters for Population Genetics: The More the Better?
title_full Selecting RAD-Seq Data Analysis Parameters for Population Genetics: The More the Better?
title_fullStr Selecting RAD-Seq Data Analysis Parameters for Population Genetics: The More the Better?
title_full_unstemmed Selecting RAD-Seq Data Analysis Parameters for Population Genetics: The More the Better?
title_short Selecting RAD-Seq Data Analysis Parameters for Population Genetics: The More the Better?
title_sort selecting rad-seq data analysis parameters for population genetics: the more the better?
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6549478/
https://www.ncbi.nlm.nih.gov/pubmed/31191624
http://dx.doi.org/10.3389/fgene.2019.00533
work_keys_str_mv AT diazarcenatalia selectingradseqdataanalysisparametersforpopulationgeneticsthemorethebetter
AT rodriguezezpeletanaiara selectingradseqdataanalysisparametersforpopulationgeneticsthemorethebetter