Cargando…

Predicting RAD-seq Marker Numbers across the Eukaryotic Tree of Life

High-throughput sequencing of reduced representation libraries obtained through digestion with restriction enzymes—generically known as restriction site associated DNA sequencing (RAD-seq)—is a common strategy to generate genome-wide genotypic and sequence data from eukaryotes. A critical design ele...

Descripción completa

Detalles Bibliográficos
Autores principales: Herrera, Santiago, Reyes-Herrera, Paula H., Shank, Timothy M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4700943/
https://www.ncbi.nlm.nih.gov/pubmed/26537225
http://dx.doi.org/10.1093/gbe/evv210
_version_ 1782408402843992064
author Herrera, Santiago
Reyes-Herrera, Paula H.
Shank, Timothy M.
author_facet Herrera, Santiago
Reyes-Herrera, Paula H.
Shank, Timothy M.
author_sort Herrera, Santiago
collection PubMed
description High-throughput sequencing of reduced representation libraries obtained through digestion with restriction enzymes—generically known as restriction site associated DNA sequencing (RAD-seq)—is a common strategy to generate genome-wide genotypic and sequence data from eukaryotes. A critical design element of any RAD-seq study is knowledge of the approximate number of genetic markers that can be obtained for a taxon using different restriction enzymes, as this number determines the scope of a project, and ultimately defines its success. This number can only be directly determined if a reference genome sequence is available, or it can be estimated if the genome size and restriction recognition sequence probabilities are known. However, both scenarios are uncommon for nonmodel species. Here, we performed systematic in silico surveys of recognition sequences, for diverse and commonly used type II restriction enzymes across the eukaryotic tree of life. Our observations reveal that recognition sequence frequencies for a given restriction enzyme are strikingly variable among broad eukaryotic taxonomic groups, being largely determined by phylogenetic relatedness. We demonstrate that genome sizes can be predicted from cleavage frequency data obtained with restriction enzymes targeting “neutral” elements. Models based on genomic compositions are also effective tools to accurately calculate probabilities of recognition sequences across taxa, and can be applied to species for which reduced representation data are available (including transcriptomes and neutral RAD-seq data sets). The analytical pipeline developed in this study, PredRAD (https://github.com/phrh/PredRAD), and the resulting databases constitute valuable resources that will help guide the design of any study using RAD-seq or related methods.
format Online
Article
Text
id pubmed-4700943
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-47009432016-01-06 Predicting RAD-seq Marker Numbers across the Eukaryotic Tree of Life Herrera, Santiago Reyes-Herrera, Paula H. Shank, Timothy M. Genome Biol Evol Genome Resources High-throughput sequencing of reduced representation libraries obtained through digestion with restriction enzymes—generically known as restriction site associated DNA sequencing (RAD-seq)—is a common strategy to generate genome-wide genotypic and sequence data from eukaryotes. A critical design element of any RAD-seq study is knowledge of the approximate number of genetic markers that can be obtained for a taxon using different restriction enzymes, as this number determines the scope of a project, and ultimately defines its success. This number can only be directly determined if a reference genome sequence is available, or it can be estimated if the genome size and restriction recognition sequence probabilities are known. However, both scenarios are uncommon for nonmodel species. Here, we performed systematic in silico surveys of recognition sequences, for diverse and commonly used type II restriction enzymes across the eukaryotic tree of life. Our observations reveal that recognition sequence frequencies for a given restriction enzyme are strikingly variable among broad eukaryotic taxonomic groups, being largely determined by phylogenetic relatedness. We demonstrate that genome sizes can be predicted from cleavage frequency data obtained with restriction enzymes targeting “neutral” elements. Models based on genomic compositions are also effective tools to accurately calculate probabilities of recognition sequences across taxa, and can be applied to species for which reduced representation data are available (including transcriptomes and neutral RAD-seq data sets). The analytical pipeline developed in this study, PredRAD (https://github.com/phrh/PredRAD), and the resulting databases constitute valuable resources that will help guide the design of any study using RAD-seq or related methods. Oxford University Press 2015-11-03 /pmc/articles/PMC4700943/ /pubmed/26537225 http://dx.doi.org/10.1093/gbe/evv210 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Genome Resources
Herrera, Santiago
Reyes-Herrera, Paula H.
Shank, Timothy M.
Predicting RAD-seq Marker Numbers across the Eukaryotic Tree of Life
title Predicting RAD-seq Marker Numbers across the Eukaryotic Tree of Life
title_full Predicting RAD-seq Marker Numbers across the Eukaryotic Tree of Life
title_fullStr Predicting RAD-seq Marker Numbers across the Eukaryotic Tree of Life
title_full_unstemmed Predicting RAD-seq Marker Numbers across the Eukaryotic Tree of Life
title_short Predicting RAD-seq Marker Numbers across the Eukaryotic Tree of Life
title_sort predicting rad-seq marker numbers across the eukaryotic tree of life
topic Genome Resources
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4700943/
https://www.ncbi.nlm.nih.gov/pubmed/26537225
http://dx.doi.org/10.1093/gbe/evv210
work_keys_str_mv AT herrerasantiago predictingradseqmarkernumbersacrosstheeukaryotictreeoflife
AT reyesherrerapaulah predictingradseqmarkernumbersacrosstheeukaryotictreeoflife
AT shanktimothym predictingradseqmarkernumbersacrosstheeukaryotictreeoflife