Cargando…

High-quality genetic mapping with ddRADseq in the non-model tree Quercus rubra

BACKGROUND: Restriction site associated DNA sequencing (RADseq) has the potential to be a broadly applicable, low-cost approach for high-quality genetic linkage mapping in forest trees lacking a reference genome. The statistical inference of linear order must be as accurate as possible for the corre...

Descripción completa

Detalles Bibliográficos
Autores principales: Konar, Arpita, Choudhury, Olivia, Bullis, Rebecca, Fiedler, Lauren, Kruser, Jacqueline M., Stephens, Melissa T., Gailing, Oliver, Schlarbaum, Scott, Coggeshall, Mark V., Staton, Margaret E., Carlson, John E., Emrich, Scott, Romero-Severson, Jeanne
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5450186/
https://www.ncbi.nlm.nih.gov/pubmed/28558688
http://dx.doi.org/10.1186/s12864-017-3765-8
_version_ 1783239916097896448
author Konar, Arpita
Choudhury, Olivia
Bullis, Rebecca
Fiedler, Lauren
Kruser, Jacqueline M.
Stephens, Melissa T.
Gailing, Oliver
Schlarbaum, Scott
Coggeshall, Mark V.
Staton, Margaret E.
Carlson, John E.
Emrich, Scott
Romero-Severson, Jeanne
author_facet Konar, Arpita
Choudhury, Olivia
Bullis, Rebecca
Fiedler, Lauren
Kruser, Jacqueline M.
Stephens, Melissa T.
Gailing, Oliver
Schlarbaum, Scott
Coggeshall, Mark V.
Staton, Margaret E.
Carlson, John E.
Emrich, Scott
Romero-Severson, Jeanne
author_sort Konar, Arpita
collection PubMed
description BACKGROUND: Restriction site associated DNA sequencing (RADseq) has the potential to be a broadly applicable, low-cost approach for high-quality genetic linkage mapping in forest trees lacking a reference genome. The statistical inference of linear order must be as accurate as possible for the correct ordering of sequence scaffolds and contigs to chromosomal locations. Accurate maps also facilitate the discovery of chromosome segments containing allelic variants conferring resistance to the biotic and abiotic stresses that threaten forest trees worldwide. We used ddRADseq for genetic mapping in the tree Quercus rubra, with an approach optimized to produce a high-quality map. Our study design also enabled us to model the results we would have obtained with less depth of coverage. RESULTS: Our sequencing design produced a high sequencing depth in the parents (248×) and a moderate sequencing depth (15×) in the progeny. The digital normalization method of generating a de novo reference and the SAMtools SNP variant caller yielded the most SNP calls (78,725). The major drivers of map inflation were multiple SNPs located within the same sequence (77% of SNPs called). The highest quality map was generated with a low level of missing data (5%) and a genome-wide threshold of 0.025 for deviation from Mendelian expectation. The final map included 849 SNP markers (1.8% of the 78,725 SNPs called). Downsampling the individual FASTQ files to model lower depth of coverage revealed that sequencing the progeny using 96 samples per lane would have yielded too few SNP markers to generate a map, even if we had sequenced the parents at depth 248×. CONCLUSIONS: The ddRADseq technology produced enough high-quality SNP markers to make a moderately dense, high-quality map. The success of this project was due to high depth of coverage of the parents, moderate depth of coverage of the progeny, a good framework map, an optimized bioinformatics pipeline, and rigorous premapping filters. The ddRADseq approach is useful for the construction of high-quality genetic maps in organisms lacking a reference genome if the parents and progeny are sequenced at sufficient depth. Technical improvements in reduced representation sequencing (RRS) approaches are needed to reduce the amount of missing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3765-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5450186
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54501862017-06-01 High-quality genetic mapping with ddRADseq in the non-model tree Quercus rubra Konar, Arpita Choudhury, Olivia Bullis, Rebecca Fiedler, Lauren Kruser, Jacqueline M. Stephens, Melissa T. Gailing, Oliver Schlarbaum, Scott Coggeshall, Mark V. Staton, Margaret E. Carlson, John E. Emrich, Scott Romero-Severson, Jeanne BMC Genomics Research Article BACKGROUND: Restriction site associated DNA sequencing (RADseq) has the potential to be a broadly applicable, low-cost approach for high-quality genetic linkage mapping in forest trees lacking a reference genome. The statistical inference of linear order must be as accurate as possible for the correct ordering of sequence scaffolds and contigs to chromosomal locations. Accurate maps also facilitate the discovery of chromosome segments containing allelic variants conferring resistance to the biotic and abiotic stresses that threaten forest trees worldwide. We used ddRADseq for genetic mapping in the tree Quercus rubra, with an approach optimized to produce a high-quality map. Our study design also enabled us to model the results we would have obtained with less depth of coverage. RESULTS: Our sequencing design produced a high sequencing depth in the parents (248×) and a moderate sequencing depth (15×) in the progeny. The digital normalization method of generating a de novo reference and the SAMtools SNP variant caller yielded the most SNP calls (78,725). The major drivers of map inflation were multiple SNPs located within the same sequence (77% of SNPs called). The highest quality map was generated with a low level of missing data (5%) and a genome-wide threshold of 0.025 for deviation from Mendelian expectation. The final map included 849 SNP markers (1.8% of the 78,725 SNPs called). Downsampling the individual FASTQ files to model lower depth of coverage revealed that sequencing the progeny using 96 samples per lane would have yielded too few SNP markers to generate a map, even if we had sequenced the parents at depth 248×. CONCLUSIONS: The ddRADseq technology produced enough high-quality SNP markers to make a moderately dense, high-quality map. The success of this project was due to high depth of coverage of the parents, moderate depth of coverage of the progeny, a good framework map, an optimized bioinformatics pipeline, and rigorous premapping filters. The ddRADseq approach is useful for the construction of high-quality genetic maps in organisms lacking a reference genome if the parents and progeny are sequenced at sufficient depth. Technical improvements in reduced representation sequencing (RRS) approaches are needed to reduce the amount of missing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3765-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-05-30 /pmc/articles/PMC5450186/ /pubmed/28558688 http://dx.doi.org/10.1186/s12864-017-3765-8 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Konar, Arpita
Choudhury, Olivia
Bullis, Rebecca
Fiedler, Lauren
Kruser, Jacqueline M.
Stephens, Melissa T.
Gailing, Oliver
Schlarbaum, Scott
Coggeshall, Mark V.
Staton, Margaret E.
Carlson, John E.
Emrich, Scott
Romero-Severson, Jeanne
High-quality genetic mapping with ddRADseq in the non-model tree Quercus rubra
title High-quality genetic mapping with ddRADseq in the non-model tree Quercus rubra
title_full High-quality genetic mapping with ddRADseq in the non-model tree Quercus rubra
title_fullStr High-quality genetic mapping with ddRADseq in the non-model tree Quercus rubra
title_full_unstemmed High-quality genetic mapping with ddRADseq in the non-model tree Quercus rubra
title_short High-quality genetic mapping with ddRADseq in the non-model tree Quercus rubra
title_sort high-quality genetic mapping with ddradseq in the non-model tree quercus rubra
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5450186/
https://www.ncbi.nlm.nih.gov/pubmed/28558688
http://dx.doi.org/10.1186/s12864-017-3765-8
work_keys_str_mv AT konararpita highqualitygeneticmappingwithddradseqinthenonmodeltreequercusrubra
AT choudhuryolivia highqualitygeneticmappingwithddradseqinthenonmodeltreequercusrubra
AT bullisrebecca highqualitygeneticmappingwithddradseqinthenonmodeltreequercusrubra
AT fiedlerlauren highqualitygeneticmappingwithddradseqinthenonmodeltreequercusrubra
AT kruserjacquelinem highqualitygeneticmappingwithddradseqinthenonmodeltreequercusrubra
AT stephensmelissat highqualitygeneticmappingwithddradseqinthenonmodeltreequercusrubra
AT gailingoliver highqualitygeneticmappingwithddradseqinthenonmodeltreequercusrubra
AT schlarbaumscott highqualitygeneticmappingwithddradseqinthenonmodeltreequercusrubra
AT coggeshallmarkv highqualitygeneticmappingwithddradseqinthenonmodeltreequercusrubra
AT statonmargarete highqualitygeneticmappingwithddradseqinthenonmodeltreequercusrubra
AT carlsonjohne highqualitygeneticmappingwithddradseqinthenonmodeltreequercusrubra
AT emrichscott highqualitygeneticmappingwithddradseqinthenonmodeltreequercusrubra
AT romeroseversonjeanne highqualitygeneticmappingwithddradseqinthenonmodeltreequercusrubra