Cargando…

Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal

BACKGROUND: Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balan...

Descripción completa

Detalles Bibliográficos
Autores principales: Vendrami, David L. J., Forcada, Jaume, Hoffman, Joseph I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6341687/
https://www.ncbi.nlm.nih.gov/pubmed/30669975
http://dx.doi.org/10.1186/s12864-019-5440-8
_version_ 1783388993531936768
author Vendrami, David L. J.
Forcada, Jaume
Hoffman, Joseph I.
author_facet Vendrami, David L. J.
Forcada, Jaume
Hoffman, Joseph I.
author_sort Vendrami, David L. J.
collection PubMed
description BACKGROUND: Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome. To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. This program generates predictions that are broadly consistent with estimates of the true number of restriction sites obtained through in silico digestion of available reference genome assemblies. However, in practice the actual number of loci obtained could potentially differ as incomplete enzymatic digestion or patchy sequence coverage across the genome might lead to some loci not being represented in a RAD dataset, while erroneous assembly could potentially inflate the number of loci. To investigate this, we used genome and transcriptome assemblies together with RADseq data from the Antarctic fur seal (Arctocephalus gazella) to compare PredRAD predictions with empirical estimates of the number of loci obtained via in silico digestion and from de novo assemblies. RESULTS: PredRAD yielded consistently higher predicted numbers of restriction sites for the transcriptome assembly relative to the genome assembly. The trinucleotide and dinucleotide models also predicted higher frequencies than the mononucleotide or GC content models. Overall, the dinucleotide and trinucleotide models applied to the transcriptome and the genome assemblies respectively generated predictions that were closest to the number of restriction sites estimated by in silico digestion. Furthermore, the number of de novo assembled RAD loci mapping to restriction sites was similar to the expectation based on in silico digestion. CONCLUSIONS: Our study reveals generally high concordance between PredRAD predictions and empirical estimates of the number of RAD loci. This further supports the utility of PredRAD, while also suggesting that it may be feasible to sequence and assemble the majority of RAD loci present in an organism’s genome. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5440-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6341687
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63416872019-01-24 Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal Vendrami, David L. J. Forcada, Jaume Hoffman, Joseph I. BMC Genomics Research Article BACKGROUND: Restriction site-associated DNA sequencing (RADseq) has revolutionized the study of wild organisms by allowing cost-effective genotyping of thousands of loci. However, for species lacking reference genomes, it can be challenging to select the restriction enzyme that offers the best balance between the number of obtained RAD loci and depth of coverage, which is crucial for a successful outcome. To address this issue, PredRAD was recently developed, which uses probabilistic models to predict restriction site frequencies from a transcriptome assembly or other sequence resource based on either GC content or mono-, di- or trinucleotide composition. This program generates predictions that are broadly consistent with estimates of the true number of restriction sites obtained through in silico digestion of available reference genome assemblies. However, in practice the actual number of loci obtained could potentially differ as incomplete enzymatic digestion or patchy sequence coverage across the genome might lead to some loci not being represented in a RAD dataset, while erroneous assembly could potentially inflate the number of loci. To investigate this, we used genome and transcriptome assemblies together with RADseq data from the Antarctic fur seal (Arctocephalus gazella) to compare PredRAD predictions with empirical estimates of the number of loci obtained via in silico digestion and from de novo assemblies. RESULTS: PredRAD yielded consistently higher predicted numbers of restriction sites for the transcriptome assembly relative to the genome assembly. The trinucleotide and dinucleotide models also predicted higher frequencies than the mononucleotide or GC content models. Overall, the dinucleotide and trinucleotide models applied to the transcriptome and the genome assemblies respectively generated predictions that were closest to the number of restriction sites estimated by in silico digestion. Furthermore, the number of de novo assembled RAD loci mapping to restriction sites was similar to the expectation based on in silico digestion. CONCLUSIONS: Our study reveals generally high concordance between PredRAD predictions and empirical estimates of the number of RAD loci. This further supports the utility of PredRAD, while also suggesting that it may be feasible to sequence and assemble the majority of RAD loci present in an organism’s genome. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5440-8) contains supplementary material, which is available to authorized users. BioMed Central 2019-01-22 /pmc/articles/PMC6341687/ /pubmed/30669975 http://dx.doi.org/10.1186/s12864-019-5440-8 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Vendrami, David L. J.
Forcada, Jaume
Hoffman, Joseph I.
Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_full Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_fullStr Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_full_unstemmed Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_short Experimental validation of in silico predicted RAD locus frequencies using genomic resources and short read data from a model marine mammal
title_sort experimental validation of in silico predicted rad locus frequencies using genomic resources and short read data from a model marine mammal
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6341687/
https://www.ncbi.nlm.nih.gov/pubmed/30669975
http://dx.doi.org/10.1186/s12864-019-5440-8
work_keys_str_mv AT vendramidavidlj experimentalvalidationofinsilicopredictedradlocusfrequenciesusinggenomicresourcesandshortreaddatafromamodelmarinemammal
AT forcadajaume experimentalvalidationofinsilicopredictedradlocusfrequenciesusinggenomicresourcesandshortreaddatafromamodelmarinemammal
AT hoffmanjosephi experimentalvalidationofinsilicopredictedradlocusfrequenciesusinggenomicresourcesandshortreaddatafromamodelmarinemammal