Cargando…

High density marker panels, SNPs prioritizing and accuracy of genomic selection

BACKGROUND: The availability of high-density (HD) marker panels, genome wide variants and sequence data creates an unprecedented opportunity to dissect the genetic basis of complex traits, enhance genomic selection (GS) and identify causal variants of disease. The disproportional increase in the num...

Descripción completa

Detalles Bibliográficos
Autores principales: Chang, Ling-Yun, Toghiani, Sajjad, Ling, Ashley, Aggrey, Sammy E., Rekaya, Romdhane
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5756446/
https://www.ncbi.nlm.nih.gov/pubmed/29304753
http://dx.doi.org/10.1186/s12863-017-0595-2
_version_ 1783290733476708352
author Chang, Ling-Yun
Toghiani, Sajjad
Ling, Ashley
Aggrey, Sammy E.
Rekaya, Romdhane
author_facet Chang, Ling-Yun
Toghiani, Sajjad
Ling, Ashley
Aggrey, Sammy E.
Rekaya, Romdhane
author_sort Chang, Ling-Yun
collection PubMed
description BACKGROUND: The availability of high-density (HD) marker panels, genome wide variants and sequence data creates an unprecedented opportunity to dissect the genetic basis of complex traits, enhance genomic selection (GS) and identify causal variants of disease. The disproportional increase in the number of parameters in the genetic association model compared to the number of phenotypes has led to further deterioration in statistical power and an increase in co-linearity and false positive rates. At best, HD panels do not significantly improve GS accuracy and, at worst, reduce accuracy. This is true for both regression and variance component approaches. To remedy this situation, some form of single nucleotide polymorphisms (SNP) filtering or external information is needed. Current methods for prioritizing SNP markers (i.e. BayesB, BayesCπ) are sensitive to the increased co-linearity in HD panels which could limit their performance. RESULTS: In this study, the usefulness of F(ST), a measure of allele frequency variation among populations, as an external source of information in GS was evaluated. A simulation was carried out for a trait with heritability of 0.4. Data was divided into three subpopulations based on phenotype distribution (bottom 5%, middle 90%, top 5%). Marker data were simulated to mimic a 770 K and 1.5 million SNP marker panel. A ten-chromosome genome with 200 K and 400 K SNPs was simulated. Several scenarios with varying distributions for the quantitative trait loci (QTL) effects were simulated. Using all 200 K markers and no filtering, the accuracy of genomic prediction was 0.77. When marker effects were simulated from a gamma distribution, SNPs pre-selected based on the 99.5, 99.0 and 97.5% quantile of the F(ST) score distribution resulted in an accuracy of 0.725, 0.797, and 0.853, respectively. Similar results were observed under other simulation scenarios. Clearly, the accuracy obtained using all SNPs can be easily achieved using only 0.5 to 1% of all markers. CONCLUSIONS: These results indicate that SNP filtering using already available external information could increase the accuracy of GS. This is especially important as next-generation sequencing technology becomes more affordable and accessible to human, animal and plant applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12863-017-0595-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5756446
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57564462018-01-09 High density marker panels, SNPs prioritizing and accuracy of genomic selection Chang, Ling-Yun Toghiani, Sajjad Ling, Ashley Aggrey, Sammy E. Rekaya, Romdhane BMC Genet Research Article BACKGROUND: The availability of high-density (HD) marker panels, genome wide variants and sequence data creates an unprecedented opportunity to dissect the genetic basis of complex traits, enhance genomic selection (GS) and identify causal variants of disease. The disproportional increase in the number of parameters in the genetic association model compared to the number of phenotypes has led to further deterioration in statistical power and an increase in co-linearity and false positive rates. At best, HD panels do not significantly improve GS accuracy and, at worst, reduce accuracy. This is true for both regression and variance component approaches. To remedy this situation, some form of single nucleotide polymorphisms (SNP) filtering or external information is needed. Current methods for prioritizing SNP markers (i.e. BayesB, BayesCπ) are sensitive to the increased co-linearity in HD panels which could limit their performance. RESULTS: In this study, the usefulness of F(ST), a measure of allele frequency variation among populations, as an external source of information in GS was evaluated. A simulation was carried out for a trait with heritability of 0.4. Data was divided into three subpopulations based on phenotype distribution (bottom 5%, middle 90%, top 5%). Marker data were simulated to mimic a 770 K and 1.5 million SNP marker panel. A ten-chromosome genome with 200 K and 400 K SNPs was simulated. Several scenarios with varying distributions for the quantitative trait loci (QTL) effects were simulated. Using all 200 K markers and no filtering, the accuracy of genomic prediction was 0.77. When marker effects were simulated from a gamma distribution, SNPs pre-selected based on the 99.5, 99.0 and 97.5% quantile of the F(ST) score distribution resulted in an accuracy of 0.725, 0.797, and 0.853, respectively. Similar results were observed under other simulation scenarios. Clearly, the accuracy obtained using all SNPs can be easily achieved using only 0.5 to 1% of all markers. CONCLUSIONS: These results indicate that SNP filtering using already available external information could increase the accuracy of GS. This is especially important as next-generation sequencing technology becomes more affordable and accessible to human, animal and plant applications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12863-017-0595-2) contains supplementary material, which is available to authorized users. BioMed Central 2018-01-05 /pmc/articles/PMC5756446/ /pubmed/29304753 http://dx.doi.org/10.1186/s12863-017-0595-2 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Chang, Ling-Yun
Toghiani, Sajjad
Ling, Ashley
Aggrey, Sammy E.
Rekaya, Romdhane
High density marker panels, SNPs prioritizing and accuracy of genomic selection
title High density marker panels, SNPs prioritizing and accuracy of genomic selection
title_full High density marker panels, SNPs prioritizing and accuracy of genomic selection
title_fullStr High density marker panels, SNPs prioritizing and accuracy of genomic selection
title_full_unstemmed High density marker panels, SNPs prioritizing and accuracy of genomic selection
title_short High density marker panels, SNPs prioritizing and accuracy of genomic selection
title_sort high density marker panels, snps prioritizing and accuracy of genomic selection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5756446/
https://www.ncbi.nlm.nih.gov/pubmed/29304753
http://dx.doi.org/10.1186/s12863-017-0595-2
work_keys_str_mv AT changlingyun highdensitymarkerpanelssnpsprioritizingandaccuracyofgenomicselection
AT toghianisajjad highdensitymarkerpanelssnpsprioritizingandaccuracyofgenomicselection
AT lingashley highdensitymarkerpanelssnpsprioritizingandaccuracyofgenomicselection
AT aggreysammye highdensitymarkerpanelssnpsprioritizingandaccuracyofgenomicselection
AT rekayaromdhane highdensitymarkerpanelssnpsprioritizingandaccuracyofgenomicselection