Cargando…

DiscoSnp-RAD: de novo detection of small variants for RAD-Seq population genomics

Restriction site Associated DNA Sequencing (RAD-Seq) is a technique characterized by the sequencing of specific loci along the genome that is widely employed in the field of evolutionary biology since it allows to exploit variants (mainly Single Nucleotide Polymorphism—SNPs) information from entire...

Descripción completa

Detalles Bibliográficos
Autores principales: Gauthier, Jérémy, Mouden, Charlotte, Suchan, Tomasz, Alvarez, Nadir, Arrigo, Nils, Riou, Chloé, Lemaitre, Claire, Peterlongo, Pierre
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7293188/
https://www.ncbi.nlm.nih.gov/pubmed/32566401
http://dx.doi.org/10.7717/peerj.9291
_version_ 1783546248524988416
author Gauthier, Jérémy
Mouden, Charlotte
Suchan, Tomasz
Alvarez, Nadir
Arrigo, Nils
Riou, Chloé
Lemaitre, Claire
Peterlongo, Pierre
author_facet Gauthier, Jérémy
Mouden, Charlotte
Suchan, Tomasz
Alvarez, Nadir
Arrigo, Nils
Riou, Chloé
Lemaitre, Claire
Peterlongo, Pierre
author_sort Gauthier, Jérémy
collection PubMed
description Restriction site Associated DNA Sequencing (RAD-Seq) is a technique characterized by the sequencing of specific loci along the genome that is widely employed in the field of evolutionary biology since it allows to exploit variants (mainly Single Nucleotide Polymorphism—SNPs) information from entire populations at a reduced cost. Common RAD dedicated tools, such as STACKS or IPyRAD, are based on all-vs-all read alignments, which require consequent time and computing resources. We present an original method, DiscoSnp-RAD, that avoids this pitfall since variants are detected by exploiting specific parts of the assembly graph built from the reads, hence preventing all-vs-all read alignments. We tested the implementation on simulated datasets of increasing size, up to 1,000 samples, and on real RAD-Seq data from 259 specimens of Chiastocheta flies, morphologically assigned to seven species. All individuals were successfully assigned to their species using both STRUCTURE and Maximum Likelihood phylogenetic reconstruction. Moreover, identified variants succeeded to reveal a within-species genetic structure linked to the geographic distribution. Furthermore, our results show that DiscoSnp-RAD is significantly faster than state-of-the-art tools. The overall results show that DiscoSnp-RAD is suitable to identify variants from RAD-Seq data, it does not require time-consuming parameterization steps and it stands out from other tools due to its completely different principle, making it substantially faster, in particular on large datasets.
format Online
Article
Text
id pubmed-7293188
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-72931882020-06-18 DiscoSnp-RAD: de novo detection of small variants for RAD-Seq population genomics Gauthier, Jérémy Mouden, Charlotte Suchan, Tomasz Alvarez, Nadir Arrigo, Nils Riou, Chloé Lemaitre, Claire Peterlongo, Pierre PeerJ Bioinformatics Restriction site Associated DNA Sequencing (RAD-Seq) is a technique characterized by the sequencing of specific loci along the genome that is widely employed in the field of evolutionary biology since it allows to exploit variants (mainly Single Nucleotide Polymorphism—SNPs) information from entire populations at a reduced cost. Common RAD dedicated tools, such as STACKS or IPyRAD, are based on all-vs-all read alignments, which require consequent time and computing resources. We present an original method, DiscoSnp-RAD, that avoids this pitfall since variants are detected by exploiting specific parts of the assembly graph built from the reads, hence preventing all-vs-all read alignments. We tested the implementation on simulated datasets of increasing size, up to 1,000 samples, and on real RAD-Seq data from 259 specimens of Chiastocheta flies, morphologically assigned to seven species. All individuals were successfully assigned to their species using both STRUCTURE and Maximum Likelihood phylogenetic reconstruction. Moreover, identified variants succeeded to reveal a within-species genetic structure linked to the geographic distribution. Furthermore, our results show that DiscoSnp-RAD is significantly faster than state-of-the-art tools. The overall results show that DiscoSnp-RAD is suitable to identify variants from RAD-Seq data, it does not require time-consuming parameterization steps and it stands out from other tools due to its completely different principle, making it substantially faster, in particular on large datasets. PeerJ Inc. 2020-06-10 /pmc/articles/PMC7293188/ /pubmed/32566401 http://dx.doi.org/10.7717/peerj.9291 Text en © 2020 Gauthier et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Gauthier, Jérémy
Mouden, Charlotte
Suchan, Tomasz
Alvarez, Nadir
Arrigo, Nils
Riou, Chloé
Lemaitre, Claire
Peterlongo, Pierre
DiscoSnp-RAD: de novo detection of small variants for RAD-Seq population genomics
title DiscoSnp-RAD: de novo detection of small variants for RAD-Seq population genomics
title_full DiscoSnp-RAD: de novo detection of small variants for RAD-Seq population genomics
title_fullStr DiscoSnp-RAD: de novo detection of small variants for RAD-Seq population genomics
title_full_unstemmed DiscoSnp-RAD: de novo detection of small variants for RAD-Seq population genomics
title_short DiscoSnp-RAD: de novo detection of small variants for RAD-Seq population genomics
title_sort discosnp-rad: de novo detection of small variants for rad-seq population genomics
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7293188/
https://www.ncbi.nlm.nih.gov/pubmed/32566401
http://dx.doi.org/10.7717/peerj.9291
work_keys_str_mv AT gauthierjeremy discosnpraddenovodetectionofsmallvariantsforradseqpopulationgenomics
AT moudencharlotte discosnpraddenovodetectionofsmallvariantsforradseqpopulationgenomics
AT suchantomasz discosnpraddenovodetectionofsmallvariantsforradseqpopulationgenomics
AT alvareznadir discosnpraddenovodetectionofsmallvariantsforradseqpopulationgenomics
AT arrigonils discosnpraddenovodetectionofsmallvariantsforradseqpopulationgenomics
AT riouchloe discosnpraddenovodetectionofsmallvariantsforradseqpopulationgenomics
AT lemaitreclaire discosnpraddenovodetectionofsmallvariantsforradseqpopulationgenomics
AT peterlongopierre discosnpraddenovodetectionofsmallvariantsforradseqpopulationgenomics