Cargando…

Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat (Triticum aestivum L.)

Most genomic prediction models are linear regression models that assume continuous and normally distributed phenotypes, but responses to diseases such as stripe rust (caused by Puccinia striiformis f. sp. tritici) are commonly recorded in ordinal scales and percentages. Disease severity (SEV) and in...

Descripción completa

Detalles Bibliográficos
Autores principales: Merrick, Lance F., Lozada, Dennis N., Chen, Xianming, Carter, Arron H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8904966/
https://www.ncbi.nlm.nih.gov/pubmed/35281841
http://dx.doi.org/10.3389/fgene.2022.835781
_version_ 1784665077130985472
author Merrick, Lance F.
Lozada, Dennis N.
Chen, Xianming
Carter, Arron H.
author_facet Merrick, Lance F.
Lozada, Dennis N.
Chen, Xianming
Carter, Arron H.
author_sort Merrick, Lance F.
collection PubMed
description Most genomic prediction models are linear regression models that assume continuous and normally distributed phenotypes, but responses to diseases such as stripe rust (caused by Puccinia striiformis f. sp. tritici) are commonly recorded in ordinal scales and percentages. Disease severity (SEV) and infection type (IT) data in germplasm screening nurseries generally do not follow these assumptions. On this regard, researchers may ignore the lack of normality, transform the phenotypes, use generalized linear models, or use supervised learning algorithms and classification models with no restriction on the distribution of response variables, which are less sensitive when modeling ordinal scores. The goal of this research was to compare classification and regression genomic selection models for skewed phenotypes using stripe rust SEV and IT in winter wheat. We extensively compared both regression and classification prediction models using two training populations composed of breeding lines phenotyped in 4 years (2016–2018 and 2020) and a diversity panel phenotyped in 4 years (2013–2016). The prediction models used 19,861 genotyping-by-sequencing single-nucleotide polymorphism markers. Overall, square root transformed phenotypes using ridge regression best linear unbiased prediction and support vector machine regression models displayed the highest combination of accuracy and relative efficiency across the regression and classification models. Furthermore, a classification system based on support vector machine and ordinal Bayesian models with a 2-Class scale for SEV reached the highest class accuracy of 0.99. This study showed that breeders can use linear and non-parametric regression models within their own breeding lines over combined years to accurately predict skewed phenotypes.
format Online
Article
Text
id pubmed-8904966
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-89049662022-03-10 Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat (Triticum aestivum L.) Merrick, Lance F. Lozada, Dennis N. Chen, Xianming Carter, Arron H. Front Genet Genetics Most genomic prediction models are linear regression models that assume continuous and normally distributed phenotypes, but responses to diseases such as stripe rust (caused by Puccinia striiformis f. sp. tritici) are commonly recorded in ordinal scales and percentages. Disease severity (SEV) and infection type (IT) data in germplasm screening nurseries generally do not follow these assumptions. On this regard, researchers may ignore the lack of normality, transform the phenotypes, use generalized linear models, or use supervised learning algorithms and classification models with no restriction on the distribution of response variables, which are less sensitive when modeling ordinal scores. The goal of this research was to compare classification and regression genomic selection models for skewed phenotypes using stripe rust SEV and IT in winter wheat. We extensively compared both regression and classification prediction models using two training populations composed of breeding lines phenotyped in 4 years (2016–2018 and 2020) and a diversity panel phenotyped in 4 years (2013–2016). The prediction models used 19,861 genotyping-by-sequencing single-nucleotide polymorphism markers. Overall, square root transformed phenotypes using ridge regression best linear unbiased prediction and support vector machine regression models displayed the highest combination of accuracy and relative efficiency across the regression and classification models. Furthermore, a classification system based on support vector machine and ordinal Bayesian models with a 2-Class scale for SEV reached the highest class accuracy of 0.99. This study showed that breeders can use linear and non-parametric regression models within their own breeding lines over combined years to accurately predict skewed phenotypes. Frontiers Media S.A. 2022-02-23 /pmc/articles/PMC8904966/ /pubmed/35281841 http://dx.doi.org/10.3389/fgene.2022.835781 Text en Copyright © 2022 Merrick, Lozada, Chen and Carter. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Merrick, Lance F.
Lozada, Dennis N.
Chen, Xianming
Carter, Arron H.
Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat (Triticum aestivum L.)
title Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat (Triticum aestivum L.)
title_full Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat (Triticum aestivum L.)
title_fullStr Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat (Triticum aestivum L.)
title_full_unstemmed Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat (Triticum aestivum L.)
title_short Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat (Triticum aestivum L.)
title_sort classification and regression models for genomic selection of skewed phenotypes: a case for disease resistance in winter wheat (triticum aestivum l.)
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8904966/
https://www.ncbi.nlm.nih.gov/pubmed/35281841
http://dx.doi.org/10.3389/fgene.2022.835781
work_keys_str_mv AT merricklancef classificationandregressionmodelsforgenomicselectionofskewedphenotypesacasefordiseaseresistanceinwinterwheattriticumaestivuml
AT lozadadennisn classificationandregressionmodelsforgenomicselectionofskewedphenotypesacasefordiseaseresistanceinwinterwheattriticumaestivuml
AT chenxianming classificationandregressionmodelsforgenomicselectionofskewedphenotypesacasefordiseaseresistanceinwinterwheattriticumaestivuml
AT carterarronh classificationandregressionmodelsforgenomicselectionofskewedphenotypesacasefordiseaseresistanceinwinterwheattriticumaestivuml