Cargando…

Accurate prediction of quantitative traits with failed SNP calls in canola and maize

In modern plant breeding, genomic selection is becoming the gold standard to select superior genotypes in large breeding populations that are only partially phenotyped. Many breeding programs commonly rely on single-nucleotide polymorphism (SNP) markers to capture genome-wide data for selection cand...

Descripción completa

Detalles Bibliográficos
Autores principales: Weber, Sven E., Chawla, Harmeet Singh, Ehrig, Lennard, Hickey, Lee T., Frisch, Matthias, Snowdon, Rod J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627008/
https://www.ncbi.nlm.nih.gov/pubmed/37936929
http://dx.doi.org/10.3389/fpls.2023.1221750
_version_ 1785131450545209344
author Weber, Sven E.
Chawla, Harmeet Singh
Ehrig, Lennard
Hickey, Lee T.
Frisch, Matthias
Snowdon, Rod J.
author_facet Weber, Sven E.
Chawla, Harmeet Singh
Ehrig, Lennard
Hickey, Lee T.
Frisch, Matthias
Snowdon, Rod J.
author_sort Weber, Sven E.
collection PubMed
description In modern plant breeding, genomic selection is becoming the gold standard to select superior genotypes in large breeding populations that are only partially phenotyped. Many breeding programs commonly rely on single-nucleotide polymorphism (SNP) markers to capture genome-wide data for selection candidates. For this purpose, SNP arrays with moderate to high marker density represent a robust and cost-effective tool to generate reproducible, easy-to-handle, high-throughput genotype data from large-scale breeding populations. However, SNP arrays are prone to technical errors that lead to failed allele calls. To overcome this problem, failed calls are often imputed, based on the assumption that failed SNP calls are purely technical. However, this ignores the biological causes for failed calls—for example: deletions—and there is increasing evidence that gene presence–absence and other kinds of genome structural variants can play a role in phenotypic expression. Because deletions are frequently not in linkage disequilibrium with their flanking SNPs, permutation of missing SNP calls can potentially obscure valuable marker–trait associations. In this study, we analyze published datasets for canola and maize using four parametric and two machine learning models and demonstrate that failed allele calls in genomic prediction are highly predictive for important agronomic traits. We present two statistical pipelines, based on population structure and linkage disequilibrium, that enable the filtering of failed SNP calls that are likely caused by biological reasons. For the population and trait examined, prediction accuracy based on these filtered failed allele calls was competitive to standard SNP-based prediction, underlying the potential value of missing data in genomic prediction approaches. The combination of SNPs with all failed allele calls or the filtered allele calls did not outperform predictions with only SNP-based prediction due to redundancy in genomic relationship estimates.
format Online
Article
Text
id pubmed-10627008
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-106270082023-11-07 Accurate prediction of quantitative traits with failed SNP calls in canola and maize Weber, Sven E. Chawla, Harmeet Singh Ehrig, Lennard Hickey, Lee T. Frisch, Matthias Snowdon, Rod J. Front Plant Sci Plant Science In modern plant breeding, genomic selection is becoming the gold standard to select superior genotypes in large breeding populations that are only partially phenotyped. Many breeding programs commonly rely on single-nucleotide polymorphism (SNP) markers to capture genome-wide data for selection candidates. For this purpose, SNP arrays with moderate to high marker density represent a robust and cost-effective tool to generate reproducible, easy-to-handle, high-throughput genotype data from large-scale breeding populations. However, SNP arrays are prone to technical errors that lead to failed allele calls. To overcome this problem, failed calls are often imputed, based on the assumption that failed SNP calls are purely technical. However, this ignores the biological causes for failed calls—for example: deletions—and there is increasing evidence that gene presence–absence and other kinds of genome structural variants can play a role in phenotypic expression. Because deletions are frequently not in linkage disequilibrium with their flanking SNPs, permutation of missing SNP calls can potentially obscure valuable marker–trait associations. In this study, we analyze published datasets for canola and maize using four parametric and two machine learning models and demonstrate that failed allele calls in genomic prediction are highly predictive for important agronomic traits. We present two statistical pipelines, based on population structure and linkage disequilibrium, that enable the filtering of failed SNP calls that are likely caused by biological reasons. For the population and trait examined, prediction accuracy based on these filtered failed allele calls was competitive to standard SNP-based prediction, underlying the potential value of missing data in genomic prediction approaches. The combination of SNPs with all failed allele calls or the filtered allele calls did not outperform predictions with only SNP-based prediction due to redundancy in genomic relationship estimates. Frontiers Media S.A. 2023-10-23 /pmc/articles/PMC10627008/ /pubmed/37936929 http://dx.doi.org/10.3389/fpls.2023.1221750 Text en Copyright © 2023 Weber, Chawla, Ehrig, Hickey, Frisch and Snowdon https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Plant Science
Weber, Sven E.
Chawla, Harmeet Singh
Ehrig, Lennard
Hickey, Lee T.
Frisch, Matthias
Snowdon, Rod J.
Accurate prediction of quantitative traits with failed SNP calls in canola and maize
title Accurate prediction of quantitative traits with failed SNP calls in canola and maize
title_full Accurate prediction of quantitative traits with failed SNP calls in canola and maize
title_fullStr Accurate prediction of quantitative traits with failed SNP calls in canola and maize
title_full_unstemmed Accurate prediction of quantitative traits with failed SNP calls in canola and maize
title_short Accurate prediction of quantitative traits with failed SNP calls in canola and maize
title_sort accurate prediction of quantitative traits with failed snp calls in canola and maize
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627008/
https://www.ncbi.nlm.nih.gov/pubmed/37936929
http://dx.doi.org/10.3389/fpls.2023.1221750
work_keys_str_mv AT webersvene accuratepredictionofquantitativetraitswithfailedsnpcallsincanolaandmaize
AT chawlaharmeetsingh accuratepredictionofquantitativetraitswithfailedsnpcallsincanolaandmaize
AT ehriglennard accuratepredictionofquantitativetraitswithfailedsnpcallsincanolaandmaize
AT hickeyleet accuratepredictionofquantitativetraitswithfailedsnpcallsincanolaandmaize
AT frischmatthias accuratepredictionofquantitativetraitswithfailedsnpcallsincanolaandmaize
AT snowdonrodj accuratepredictionofquantitativetraitswithfailedsnpcallsincanolaandmaize