Cargando…

Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.]

Obtaining genome-wide genotype information for millions of SNPs in soybean [Glycine max (L.) Merr.] often involves completely resequencing a line at 5X or greater coverage. Currently, hundreds of soybean lines have been resequenced at high depth levels with their data deposited in the NCBI Short Rea...

Descripción completa

Detalles Bibliográficos
Autores principales: Happ, Mary M., Wang, Haichuan, Graef, George L., Hyten, David L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6643887/
https://www.ncbi.nlm.nih.gov/pubmed/31072870
http://dx.doi.org/10.1534/g3.119.400093
_version_ 1783437176010178560
author Happ, Mary M.
Wang, Haichuan
Graef, George L.
Hyten, David L.
author_facet Happ, Mary M.
Wang, Haichuan
Graef, George L.
Hyten, David L.
author_sort Happ, Mary M.
collection PubMed
description Obtaining genome-wide genotype information for millions of SNPs in soybean [Glycine max (L.) Merr.] often involves completely resequencing a line at 5X or greater coverage. Currently, hundreds of soybean lines have been resequenced at high depth levels with their data deposited in the NCBI Short Read Archive. This publicly available dataset may be leveraged as an imputation reference panel in combination with skim (low coverage) sequencing of new soybean genotypes to economically obtain high-density SNP information. Ninety-nine soybean lines resequenced at an average of 17.1X were used to generate a reference panel, with over 10 million SNPs called using GATK’s Haplotype Caller tool. Whole genome resequencing at approximately 1X depth was performed on 114 previously ungenotyped experimental soybean lines. Coverages down to 0.1X were analyzed by randomly subsetting raw reads from the original 1X sequence data. SNPs discovered in the reference panel were genotyped in the experimental lines after aligning to the soybean reference genome, and missing markers imputed using Beagle 4.1. Sequencing depth of the experimental lines could be reduced to 0.3X while still retaining an accuracy of 97.8%. Accuracy was inversely related to minor allele frequency, and highly correlated with marker linkage disequilibrium. The high accuracy of skim sequencing combined with imputation provides a low cost method for obtaining dense genotypic information that can be used for various genomics applications in soybean.
format Online
Article
Text
id pubmed-6643887
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-66438872019-07-25 Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.] Happ, Mary M. Wang, Haichuan Graef, George L. Hyten, David L. G3 (Bethesda) Investigations Obtaining genome-wide genotype information for millions of SNPs in soybean [Glycine max (L.) Merr.] often involves completely resequencing a line at 5X or greater coverage. Currently, hundreds of soybean lines have been resequenced at high depth levels with their data deposited in the NCBI Short Read Archive. This publicly available dataset may be leveraged as an imputation reference panel in combination with skim (low coverage) sequencing of new soybean genotypes to economically obtain high-density SNP information. Ninety-nine soybean lines resequenced at an average of 17.1X were used to generate a reference panel, with over 10 million SNPs called using GATK’s Haplotype Caller tool. Whole genome resequencing at approximately 1X depth was performed on 114 previously ungenotyped experimental soybean lines. Coverages down to 0.1X were analyzed by randomly subsetting raw reads from the original 1X sequence data. SNPs discovered in the reference panel were genotyped in the experimental lines after aligning to the soybean reference genome, and missing markers imputed using Beagle 4.1. Sequencing depth of the experimental lines could be reduced to 0.3X while still retaining an accuracy of 97.8%. Accuracy was inversely related to minor allele frequency, and highly correlated with marker linkage disequilibrium. The high accuracy of skim sequencing combined with imputation provides a low cost method for obtaining dense genotypic information that can be used for various genomics applications in soybean. Genetics Society of America 2019-05-09 /pmc/articles/PMC6643887/ /pubmed/31072870 http://dx.doi.org/10.1534/g3.119.400093 Text en Copyright © 2019 Happ et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Happ, Mary M.
Wang, Haichuan
Graef, George L.
Hyten, David L.
Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.]
title Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.]
title_full Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.]
title_fullStr Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.]
title_full_unstemmed Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.]
title_short Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.]
title_sort generating high density, low cost genotype data in soybean [glycine max (l.) merr.]
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6643887/
https://www.ncbi.nlm.nih.gov/pubmed/31072870
http://dx.doi.org/10.1534/g3.119.400093
work_keys_str_mv AT happmarym generatinghighdensitylowcostgenotypedatainsoybeanglycinemaxlmerr
AT wanghaichuan generatinghighdensitylowcostgenotypedatainsoybeanglycinemaxlmerr
AT graefgeorgel generatinghighdensitylowcostgenotypedatainsoybeanglycinemaxlmerr
AT hytendavidl generatinghighdensitylowcostgenotypedatainsoybeanglycinemaxlmerr