Cargando…

Fuzzy Logic as a Strategy for Combining Marker Statistics to Optimize Preselection of High-Density and Sequence Genotype Data

The high dimensionality of genotype data available for genomic evaluations has presented a motivation for developing strategies to identify subsets of markers capable of increasing the accuracy of predictions compared to the current commercial single nucleotide polymorphism (SNP) chips. In this simu...

Descripción completa

Detalles Bibliográficos
Autores principales: Ling, Ashley, Hay, El Hamidi, Aggrey, Samuel E., Rekaya, Romdhane
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9690945/
https://www.ncbi.nlm.nih.gov/pubmed/36421775
http://dx.doi.org/10.3390/genes13112100
Descripción
Sumario:The high dimensionality of genotype data available for genomic evaluations has presented a motivation for developing strategies to identify subsets of markers capable of increasing the accuracy of predictions compared to the current commercial single nucleotide polymorphism (SNP) chips. In this simulation study, an algorithm for combining statistics used in the preselection and prioritization of SNP markers from a high-density panel (1.3 million SNPs) into a composite “fuzzy” ranking score based on a Sugeno-type fuzzy inference system (FIS) was developed and evaluated for performance in preselection for genomic predictions. F [Formula: see text] scores, and p-values were evaluated as inputs for the FIS. The accuracy of genomic predictions for fuzzy-score-preselected panel sizes of 1–50 k SNPs ranged from −0.4–11.7 and −0.3–3.8% higher than F [Formula: see text] and p-value preselection, respectively. Though gains in prediction accuracies using only two inputs to the FIS were modest, preselection based on fuzzy scores yielded more accurate predictions than both F [Formula: see text] scores and p-values for the majority of evaluated panel sizes under all genetic architectures. FIS have the potential to aggregate information from multiple criteria that reflect SNP-trait associations and biological relevance in a flexible and efficient way to yield higher quality genomic predictions.