Cargando…
Accurate prediction of quantitative traits with failed SNP calls in canola and maize
In modern plant breeding, genomic selection is becoming the gold standard to select superior genotypes in large breeding populations that are only partially phenotyped. Many breeding programs commonly rely on single-nucleotide polymorphism (SNP) markers to capture genome-wide data for selection cand...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627008/ https://www.ncbi.nlm.nih.gov/pubmed/37936929 http://dx.doi.org/10.3389/fpls.2023.1221750 |
_version_ | 1785131450545209344 |
---|---|
author | Weber, Sven E. Chawla, Harmeet Singh Ehrig, Lennard Hickey, Lee T. Frisch, Matthias Snowdon, Rod J. |
author_facet | Weber, Sven E. Chawla, Harmeet Singh Ehrig, Lennard Hickey, Lee T. Frisch, Matthias Snowdon, Rod J. |
author_sort | Weber, Sven E. |
collection | PubMed |
description | In modern plant breeding, genomic selection is becoming the gold standard to select superior genotypes in large breeding populations that are only partially phenotyped. Many breeding programs commonly rely on single-nucleotide polymorphism (SNP) markers to capture genome-wide data for selection candidates. For this purpose, SNP arrays with moderate to high marker density represent a robust and cost-effective tool to generate reproducible, easy-to-handle, high-throughput genotype data from large-scale breeding populations. However, SNP arrays are prone to technical errors that lead to failed allele calls. To overcome this problem, failed calls are often imputed, based on the assumption that failed SNP calls are purely technical. However, this ignores the biological causes for failed calls—for example: deletions—and there is increasing evidence that gene presence–absence and other kinds of genome structural variants can play a role in phenotypic expression. Because deletions are frequently not in linkage disequilibrium with their flanking SNPs, permutation of missing SNP calls can potentially obscure valuable marker–trait associations. In this study, we analyze published datasets for canola and maize using four parametric and two machine learning models and demonstrate that failed allele calls in genomic prediction are highly predictive for important agronomic traits. We present two statistical pipelines, based on population structure and linkage disequilibrium, that enable the filtering of failed SNP calls that are likely caused by biological reasons. For the population and trait examined, prediction accuracy based on these filtered failed allele calls was competitive to standard SNP-based prediction, underlying the potential value of missing data in genomic prediction approaches. The combination of SNPs with all failed allele calls or the filtered allele calls did not outperform predictions with only SNP-based prediction due to redundancy in genomic relationship estimates. |
format | Online Article Text |
id | pubmed-10627008 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-106270082023-11-07 Accurate prediction of quantitative traits with failed SNP calls in canola and maize Weber, Sven E. Chawla, Harmeet Singh Ehrig, Lennard Hickey, Lee T. Frisch, Matthias Snowdon, Rod J. Front Plant Sci Plant Science In modern plant breeding, genomic selection is becoming the gold standard to select superior genotypes in large breeding populations that are only partially phenotyped. Many breeding programs commonly rely on single-nucleotide polymorphism (SNP) markers to capture genome-wide data for selection candidates. For this purpose, SNP arrays with moderate to high marker density represent a robust and cost-effective tool to generate reproducible, easy-to-handle, high-throughput genotype data from large-scale breeding populations. However, SNP arrays are prone to technical errors that lead to failed allele calls. To overcome this problem, failed calls are often imputed, based on the assumption that failed SNP calls are purely technical. However, this ignores the biological causes for failed calls—for example: deletions—and there is increasing evidence that gene presence–absence and other kinds of genome structural variants can play a role in phenotypic expression. Because deletions are frequently not in linkage disequilibrium with their flanking SNPs, permutation of missing SNP calls can potentially obscure valuable marker–trait associations. In this study, we analyze published datasets for canola and maize using four parametric and two machine learning models and demonstrate that failed allele calls in genomic prediction are highly predictive for important agronomic traits. We present two statistical pipelines, based on population structure and linkage disequilibrium, that enable the filtering of failed SNP calls that are likely caused by biological reasons. For the population and trait examined, prediction accuracy based on these filtered failed allele calls was competitive to standard SNP-based prediction, underlying the potential value of missing data in genomic prediction approaches. The combination of SNPs with all failed allele calls or the filtered allele calls did not outperform predictions with only SNP-based prediction due to redundancy in genomic relationship estimates. Frontiers Media S.A. 2023-10-23 /pmc/articles/PMC10627008/ /pubmed/37936929 http://dx.doi.org/10.3389/fpls.2023.1221750 Text en Copyright © 2023 Weber, Chawla, Ehrig, Hickey, Frisch and Snowdon https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Plant Science Weber, Sven E. Chawla, Harmeet Singh Ehrig, Lennard Hickey, Lee T. Frisch, Matthias Snowdon, Rod J. Accurate prediction of quantitative traits with failed SNP calls in canola and maize |
title | Accurate prediction of quantitative traits with failed SNP calls in canola and maize |
title_full | Accurate prediction of quantitative traits with failed SNP calls in canola and maize |
title_fullStr | Accurate prediction of quantitative traits with failed SNP calls in canola and maize |
title_full_unstemmed | Accurate prediction of quantitative traits with failed SNP calls in canola and maize |
title_short | Accurate prediction of quantitative traits with failed SNP calls in canola and maize |
title_sort | accurate prediction of quantitative traits with failed snp calls in canola and maize |
topic | Plant Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10627008/ https://www.ncbi.nlm.nih.gov/pubmed/37936929 http://dx.doi.org/10.3389/fpls.2023.1221750 |
work_keys_str_mv | AT webersvene accuratepredictionofquantitativetraitswithfailedsnpcallsincanolaandmaize AT chawlaharmeetsingh accuratepredictionofquantitativetraitswithfailedsnpcallsincanolaandmaize AT ehriglennard accuratepredictionofquantitativetraitswithfailedsnpcallsincanolaandmaize AT hickeyleet accuratepredictionofquantitativetraitswithfailedsnpcallsincanolaandmaize AT frischmatthias accuratepredictionofquantitativetraitswithfailedsnpcallsincanolaandmaize AT snowdonrodj accuratepredictionofquantitativetraitswithfailedsnpcallsincanolaandmaize |