Cargando…

Machine-Learning-Based Genome-Wide Association Studies for Uncovering QTL Underlying Soybean Yield and Its Components

A genome-wide association study (GWAS) is currently one of the most recommended approaches for discovering marker-trait associations (MTAs) for complex traits in plant species. Insufficient statistical power is a limiting factor, especially in narrow genetic basis species, that conventional GWAS met...

Descripción completa

Detalles Bibliográficos
Autores principales: Yoosefzadeh-Najafabadi, Mohsen, Eskandari, Milad, Torabi, Sepideh, Torkamaneh, Davoud, Tulpan, Dan, Rajcan, Istvan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9141736/
https://www.ncbi.nlm.nih.gov/pubmed/35628351
http://dx.doi.org/10.3390/ijms23105538
_version_ 1784715416163057664
author Yoosefzadeh-Najafabadi, Mohsen
Eskandari, Milad
Torabi, Sepideh
Torkamaneh, Davoud
Tulpan, Dan
Rajcan, Istvan
author_facet Yoosefzadeh-Najafabadi, Mohsen
Eskandari, Milad
Torabi, Sepideh
Torkamaneh, Davoud
Tulpan, Dan
Rajcan, Istvan
author_sort Yoosefzadeh-Najafabadi, Mohsen
collection PubMed
description A genome-wide association study (GWAS) is currently one of the most recommended approaches for discovering marker-trait associations (MTAs) for complex traits in plant species. Insufficient statistical power is a limiting factor, especially in narrow genetic basis species, that conventional GWAS methods are suffering from. Using sophisticated mathematical methods such as machine learning (ML) algorithms may address this issue and advance the implication of this valuable genetic method in applied plant-breeding programs. In this study, we evaluated the potential use of two ML algorithms, support-vector machine (SVR) and random forest (RF), in a GWAS and compared them with two conventional methods of mixed linear models (MLM) and fixed and random model circulating probability unification (FarmCPU), for identifying MTAs for soybean-yield components. In this study, important soybean-yield component traits, including the number of reproductive nodes (RNP), non-reproductive nodes (NRNP), total nodes (NP), and total pods (PP) per plant along with yield and maturity, were assessed using a panel of 227 soybean genotypes evaluated at two locations over two years (four environments). Using the SVR-mediated GWAS method, we were able to discover MTAs colocalized with previously reported quantitative trait loci (QTL) with potential causal effects on the target traits, supported by the functional annotation of candidate gene analyses. This study demonstrated the potential benefit of using sophisticated mathematical approaches, such as SVR, in a GWAS to complement conventional GWAS methods for identifying MTAs that can improve the efficiency of genomic-based soybean-breeding programs.
format Online
Article
Text
id pubmed-9141736
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-91417362022-05-28 Machine-Learning-Based Genome-Wide Association Studies for Uncovering QTL Underlying Soybean Yield and Its Components Yoosefzadeh-Najafabadi, Mohsen Eskandari, Milad Torabi, Sepideh Torkamaneh, Davoud Tulpan, Dan Rajcan, Istvan Int J Mol Sci Article A genome-wide association study (GWAS) is currently one of the most recommended approaches for discovering marker-trait associations (MTAs) for complex traits in plant species. Insufficient statistical power is a limiting factor, especially in narrow genetic basis species, that conventional GWAS methods are suffering from. Using sophisticated mathematical methods such as machine learning (ML) algorithms may address this issue and advance the implication of this valuable genetic method in applied plant-breeding programs. In this study, we evaluated the potential use of two ML algorithms, support-vector machine (SVR) and random forest (RF), in a GWAS and compared them with two conventional methods of mixed linear models (MLM) and fixed and random model circulating probability unification (FarmCPU), for identifying MTAs for soybean-yield components. In this study, important soybean-yield component traits, including the number of reproductive nodes (RNP), non-reproductive nodes (NRNP), total nodes (NP), and total pods (PP) per plant along with yield and maturity, were assessed using a panel of 227 soybean genotypes evaluated at two locations over two years (four environments). Using the SVR-mediated GWAS method, we were able to discover MTAs colocalized with previously reported quantitative trait loci (QTL) with potential causal effects on the target traits, supported by the functional annotation of candidate gene analyses. This study demonstrated the potential benefit of using sophisticated mathematical approaches, such as SVR, in a GWAS to complement conventional GWAS methods for identifying MTAs that can improve the efficiency of genomic-based soybean-breeding programs. MDPI 2022-05-16 /pmc/articles/PMC9141736/ /pubmed/35628351 http://dx.doi.org/10.3390/ijms23105538 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Yoosefzadeh-Najafabadi, Mohsen
Eskandari, Milad
Torabi, Sepideh
Torkamaneh, Davoud
Tulpan, Dan
Rajcan, Istvan
Machine-Learning-Based Genome-Wide Association Studies for Uncovering QTL Underlying Soybean Yield and Its Components
title Machine-Learning-Based Genome-Wide Association Studies for Uncovering QTL Underlying Soybean Yield and Its Components
title_full Machine-Learning-Based Genome-Wide Association Studies for Uncovering QTL Underlying Soybean Yield and Its Components
title_fullStr Machine-Learning-Based Genome-Wide Association Studies for Uncovering QTL Underlying Soybean Yield and Its Components
title_full_unstemmed Machine-Learning-Based Genome-Wide Association Studies for Uncovering QTL Underlying Soybean Yield and Its Components
title_short Machine-Learning-Based Genome-Wide Association Studies for Uncovering QTL Underlying Soybean Yield and Its Components
title_sort machine-learning-based genome-wide association studies for uncovering qtl underlying soybean yield and its components
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9141736/
https://www.ncbi.nlm.nih.gov/pubmed/35628351
http://dx.doi.org/10.3390/ijms23105538
work_keys_str_mv AT yoosefzadehnajafabadimohsen machinelearningbasedgenomewideassociationstudiesforuncoveringqtlunderlyingsoybeanyieldanditscomponents
AT eskandarimilad machinelearningbasedgenomewideassociationstudiesforuncoveringqtlunderlyingsoybeanyieldanditscomponents
AT torabisepideh machinelearningbasedgenomewideassociationstudiesforuncoveringqtlunderlyingsoybeanyieldanditscomponents
AT torkamanehdavoud machinelearningbasedgenomewideassociationstudiesforuncoveringqtlunderlyingsoybeanyieldanditscomponents
AT tulpandan machinelearningbasedgenomewideassociationstudiesforuncoveringqtlunderlyingsoybeanyieldanditscomponents
AT rajcanistvan machinelearningbasedgenomewideassociationstudiesforuncoveringqtlunderlyingsoybeanyieldanditscomponents