Cargando…
Contributions of linkage disequilibrium and co-segregation information to the accuracy of genomic prediction
BACKGROUND: Traditional genomic prediction models using multiple regression on single nucleotide polymorphisms (SNPs) genotypes exploit associations between genotypes of quantitative trait loci (QTL) and SNPs, which can be created by historical linkage disequilibrium (LD), recent co-segregation (CS)...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5060012/ https://www.ncbi.nlm.nih.gov/pubmed/27729012 http://dx.doi.org/10.1186/s12711-016-0255-4 |
Sumario: | BACKGROUND: Traditional genomic prediction models using multiple regression on single nucleotide polymorphisms (SNPs) genotypes exploit associations between genotypes of quantitative trait loci (QTL) and SNPs, which can be created by historical linkage disequilibrium (LD), recent co-segregation (CS) and pedigree relationships. Results from field data analyses show that prediction accuracy is usually much higher for individuals that are close relatives of the training population than for distantly related individuals. A possible reason is that historical LD between QTL and SNPs is weak and, for close relatives, prediction accuracy of SNP models is mainly contributed by pedigree relationships and CS. Information from pedigree relationships decreases fast over generations and only contributes to within-family prediction. Information from CS is affected by family structures and effective population size, and can have a substantial contribution to prediction accuracy when modeled explicitly. RESULTS: In this study, a method to explicitly model CS was developed by following the transmission of putative QTL alleles using allele origins at SNPs. Bayesian hierarchical models that combine information from LD and CS (LD-CS model) were developed for genomic prediction in pedigree populations. Contributions of LD and CS information to prediction accuracy across families and generations without retraining were investigated in simulated half-sib datasets and deep pedigrees with different recent effective population sizes, respectively. Results from half-sib datasets showed that when historical LD between QTL and SNPs is low, accuracy of the LD model decreased when the training data size is increased by adding independent sire families, but accuracies from the CS and LD-CS models increased and plateaued rapidly. Results from deep pedigree datasets show that the LD model had high accuracy across generations only when historical LD between QTL and SNPs was high. Modeling CS explicitly resulted in higher accuracy than the LD model across generations when the mating design generated many close relatives. CONCLUSIONS: Our results suggest that modeling CS explicitly improves accuracy of genomic prediction when historical LD between QTL and SNPs is low. Modeling both LD and CS explicitly is expected to improve accuracy when recent effective population size is small, or when the training data include many independent families. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12711-016-0255-4) contains supplementary material, which is available to authorized users. |
---|