Cargando…
Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.]
Obtaining genome-wide genotype information for millions of SNPs in soybean [Glycine max (L.) Merr.] often involves completely resequencing a line at 5X or greater coverage. Currently, hundreds of soybean lines have been resequenced at high depth levels with their data deposited in the NCBI Short Rea...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Genetics Society of America
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6643887/ https://www.ncbi.nlm.nih.gov/pubmed/31072870 http://dx.doi.org/10.1534/g3.119.400093 |
_version_ | 1783437176010178560 |
---|---|
author | Happ, Mary M. Wang, Haichuan Graef, George L. Hyten, David L. |
author_facet | Happ, Mary M. Wang, Haichuan Graef, George L. Hyten, David L. |
author_sort | Happ, Mary M. |
collection | PubMed |
description | Obtaining genome-wide genotype information for millions of SNPs in soybean [Glycine max (L.) Merr.] often involves completely resequencing a line at 5X or greater coverage. Currently, hundreds of soybean lines have been resequenced at high depth levels with their data deposited in the NCBI Short Read Archive. This publicly available dataset may be leveraged as an imputation reference panel in combination with skim (low coverage) sequencing of new soybean genotypes to economically obtain high-density SNP information. Ninety-nine soybean lines resequenced at an average of 17.1X were used to generate a reference panel, with over 10 million SNPs called using GATK’s Haplotype Caller tool. Whole genome resequencing at approximately 1X depth was performed on 114 previously ungenotyped experimental soybean lines. Coverages down to 0.1X were analyzed by randomly subsetting raw reads from the original 1X sequence data. SNPs discovered in the reference panel were genotyped in the experimental lines after aligning to the soybean reference genome, and missing markers imputed using Beagle 4.1. Sequencing depth of the experimental lines could be reduced to 0.3X while still retaining an accuracy of 97.8%. Accuracy was inversely related to minor allele frequency, and highly correlated with marker linkage disequilibrium. The high accuracy of skim sequencing combined with imputation provides a low cost method for obtaining dense genotypic information that can be used for various genomics applications in soybean. |
format | Online Article Text |
id | pubmed-6643887 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Genetics Society of America |
record_format | MEDLINE/PubMed |
spelling | pubmed-66438872019-07-25 Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.] Happ, Mary M. Wang, Haichuan Graef, George L. Hyten, David L. G3 (Bethesda) Investigations Obtaining genome-wide genotype information for millions of SNPs in soybean [Glycine max (L.) Merr.] often involves completely resequencing a line at 5X or greater coverage. Currently, hundreds of soybean lines have been resequenced at high depth levels with their data deposited in the NCBI Short Read Archive. This publicly available dataset may be leveraged as an imputation reference panel in combination with skim (low coverage) sequencing of new soybean genotypes to economically obtain high-density SNP information. Ninety-nine soybean lines resequenced at an average of 17.1X were used to generate a reference panel, with over 10 million SNPs called using GATK’s Haplotype Caller tool. Whole genome resequencing at approximately 1X depth was performed on 114 previously ungenotyped experimental soybean lines. Coverages down to 0.1X were analyzed by randomly subsetting raw reads from the original 1X sequence data. SNPs discovered in the reference panel were genotyped in the experimental lines after aligning to the soybean reference genome, and missing markers imputed using Beagle 4.1. Sequencing depth of the experimental lines could be reduced to 0.3X while still retaining an accuracy of 97.8%. Accuracy was inversely related to minor allele frequency, and highly correlated with marker linkage disequilibrium. The high accuracy of skim sequencing combined with imputation provides a low cost method for obtaining dense genotypic information that can be used for various genomics applications in soybean. Genetics Society of America 2019-05-09 /pmc/articles/PMC6643887/ /pubmed/31072870 http://dx.doi.org/10.1534/g3.119.400093 Text en Copyright © 2019 Happ et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Investigations Happ, Mary M. Wang, Haichuan Graef, George L. Hyten, David L. Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.] |
title | Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.] |
title_full | Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.] |
title_fullStr | Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.] |
title_full_unstemmed | Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.] |
title_short | Generating High Density, Low Cost Genotype Data in Soybean [Glycine max (L.) Merr.] |
title_sort | generating high density, low cost genotype data in soybean [glycine max (l.) merr.] |
topic | Investigations |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6643887/ https://www.ncbi.nlm.nih.gov/pubmed/31072870 http://dx.doi.org/10.1534/g3.119.400093 |
work_keys_str_mv | AT happmarym generatinghighdensitylowcostgenotypedatainsoybeanglycinemaxlmerr AT wanghaichuan generatinghighdensitylowcostgenotypedatainsoybeanglycinemaxlmerr AT graefgeorgel generatinghighdensitylowcostgenotypedatainsoybeanglycinemaxlmerr AT hytendavidl generatinghighdensitylowcostgenotypedatainsoybeanglycinemaxlmerr |