Cargando…

Use of the Multivariate Discriminant Analysis for Genome-Wide Association Studies in Cattle

SIMPLE SUMMARY: In the traditional single marker regression approach for genome-wide association studies, if the number of involved individuals is small and the number of single nucleotide polymorphisms (SNPs) to be tested is very large, the probability of getting a significant association simply du...

Descripción completa

Detalles Bibliográficos
Autores principales: Manca, Elisabetta, Cesarani, Alberto, Gaspa, Giustino, Sorbolini, Silvia, Macciotta, Nicolò P.P., Dimauro, Corrado
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7460480/
https://www.ncbi.nlm.nih.gov/pubmed/32751408
http://dx.doi.org/10.3390/ani10081300
Descripción
Sumario:SIMPLE SUMMARY: In the traditional single marker regression approach for genome-wide association studies, if the number of involved individuals is small and the number of single nucleotide polymorphisms (SNPs) to be tested is very large, the probability of getting a significant association simply due to chance becomes enormous. Other techniques, such as the Bayesian methods, require several a priori assumptions, as an a priori posterior inclusion probability threshold, that can limit their effectiveness. In the present study, a multivariate algorithm able to partially overcome this problem was proposed. On simulated data, with 3000 individuals, only 13 and 3 quantitative trait loci (QTLs) were obtained with the single marker regression and the Bayesian approaches, respectively. On the other hand, the multivariate algorithm detected 65 QTLs in the same scenario. The gap between the single marker regression and the multivariate methods slowly decreased as the number of animals increased. This figure was also confirmed on real data. ABSTRACT: Genome-wide association studies (GWAS) are traditionally carried out by using the single marker regression model that, if a small number of individuals is involved, often lead to very few associations. The Bayesian methods, such as BayesR, have obtained encouraging results when they are applied to the GWAS. However, these approaches, require that an a priori posterior inclusion probability threshold be fixed, thus arbitrarily affecting the obtained associations. To partially overcome these problems, a multivariate statistical algorithm was proposed. The basic idea was that animals with different phenotypic values of a specific trait share different allelic combinations for genes involved in its determinism. Three multivariate techniques were used to highlight the differences between the individuals assembled in high and low phenotype groups: the canonical discriminant analysis, the discriminant analysis and the stepwise discriminant analysis. The multivariate method was tested both on simulated and on real data. The results from the simulation study highlighted that the multivariate GWAS detected a greater number of true associated single nucleotide polymorphisms (SNPs) and Quantitative trait loci (QTLs) than the single marker model and the Bayesian approach. For example, with 3000 animals, the traditional GWAS highlighted only 29 significantly associated markers and 13 QTLs, whereas the multivariate method found 127 associated SNPs and 65 QTLs. The gap between the two approaches slowly decreased as the number of animals increased. The Bayesian method gave worse results than the other two. On average, with the real data, the multivariate GWAS found 108 associated markers for each trait under study and among them, around 63% SNPs were also found in the single marker approach. Among the top 118 associated markers, 76 SNPs harbored putative candidate genes.