Cargando…
QTG-Finder2: A Generalized Machine-Learning Algorithm for Prioritizing QTL Causal Genes in Plants
Linkage mapping has been widely used to identify quantitative trait loci (QTL) in many plants and usually requires a time-consuming and labor-intensive fine mapping process to find the causal gene underlying the QTL. Previously, we described QTG-Finder, a machine-learning algorithm to rationally pri...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Genetics Society of America
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7341141/ https://www.ncbi.nlm.nih.gov/pubmed/32430305 http://dx.doi.org/10.1534/g3.120.401122 |
_version_ | 1783555171775676416 |
---|---|
author | Lin, Fan Lazarus, Elena Z. Rhee, Seung Y. |
author_facet | Lin, Fan Lazarus, Elena Z. Rhee, Seung Y. |
author_sort | Lin, Fan |
collection | PubMed |
description | Linkage mapping has been widely used to identify quantitative trait loci (QTL) in many plants and usually requires a time-consuming and labor-intensive fine mapping process to find the causal gene underlying the QTL. Previously, we described QTG-Finder, a machine-learning algorithm to rationally prioritize candidate causal genes in QTLs. While it showed good performance, QTG-Finder could only be used in Arabidopsis and rice because of the limited number of known causal genes in other species. Here we tested the feasibility of enabling QTG-Finder to work on species that have few or no known causal genes by using orthologs of known causal genes as the training set. The model trained with orthologs could recall about 64% of Arabidopsis and 83% of rice causal genes when the top 20% ranked genes were considered, which is similar to the performance of models trained with known causal genes. The average precision was 0.027 for Arabidopsis and 0.029 for rice. We further extended the algorithm to include polymorphisms in conserved non-coding sequences and gene presence/absence variation as additional features. Using this algorithm, QTG-Finder2, we trained and cross-validated Sorghum bicolor and Setaria viridis models. The S. bicolor model was validated by causal genes curated from the literature and could recall 70% of causal genes when the top 20% ranked genes were considered. In addition, we applied the S. viridis model and public transcriptome data to prioritize a plant height QTL and identified 13 candidate genes. QTL-Finder2 can accelerate the discovery of causal genes in any plant species and facilitate agricultural trait improvement. |
format | Online Article Text |
id | pubmed-7341141 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Genetics Society of America |
record_format | MEDLINE/PubMed |
spelling | pubmed-73411412020-07-21 QTG-Finder2: A Generalized Machine-Learning Algorithm for Prioritizing QTL Causal Genes in Plants Lin, Fan Lazarus, Elena Z. Rhee, Seung Y. G3 (Bethesda) Investigations Linkage mapping has been widely used to identify quantitative trait loci (QTL) in many plants and usually requires a time-consuming and labor-intensive fine mapping process to find the causal gene underlying the QTL. Previously, we described QTG-Finder, a machine-learning algorithm to rationally prioritize candidate causal genes in QTLs. While it showed good performance, QTG-Finder could only be used in Arabidopsis and rice because of the limited number of known causal genes in other species. Here we tested the feasibility of enabling QTG-Finder to work on species that have few or no known causal genes by using orthologs of known causal genes as the training set. The model trained with orthologs could recall about 64% of Arabidopsis and 83% of rice causal genes when the top 20% ranked genes were considered, which is similar to the performance of models trained with known causal genes. The average precision was 0.027 for Arabidopsis and 0.029 for rice. We further extended the algorithm to include polymorphisms in conserved non-coding sequences and gene presence/absence variation as additional features. Using this algorithm, QTG-Finder2, we trained and cross-validated Sorghum bicolor and Setaria viridis models. The S. bicolor model was validated by causal genes curated from the literature and could recall 70% of causal genes when the top 20% ranked genes were considered. In addition, we applied the S. viridis model and public transcriptome data to prioritize a plant height QTL and identified 13 candidate genes. QTL-Finder2 can accelerate the discovery of causal genes in any plant species and facilitate agricultural trait improvement. Genetics Society of America 2020-05-18 /pmc/articles/PMC7341141/ /pubmed/32430305 http://dx.doi.org/10.1534/g3.120.401122 Text en Copyright © 2020 Lin et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Investigations Lin, Fan Lazarus, Elena Z. Rhee, Seung Y. QTG-Finder2: A Generalized Machine-Learning Algorithm for Prioritizing QTL Causal Genes in Plants |
title | QTG-Finder2: A Generalized Machine-Learning Algorithm for Prioritizing QTL Causal Genes in Plants |
title_full | QTG-Finder2: A Generalized Machine-Learning Algorithm for Prioritizing QTL Causal Genes in Plants |
title_fullStr | QTG-Finder2: A Generalized Machine-Learning Algorithm for Prioritizing QTL Causal Genes in Plants |
title_full_unstemmed | QTG-Finder2: A Generalized Machine-Learning Algorithm for Prioritizing QTL Causal Genes in Plants |
title_short | QTG-Finder2: A Generalized Machine-Learning Algorithm for Prioritizing QTL Causal Genes in Plants |
title_sort | qtg-finder2: a generalized machine-learning algorithm for prioritizing qtl causal genes in plants |
topic | Investigations |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7341141/ https://www.ncbi.nlm.nih.gov/pubmed/32430305 http://dx.doi.org/10.1534/g3.120.401122 |
work_keys_str_mv | AT linfan qtgfinder2ageneralizedmachinelearningalgorithmforprioritizingqtlcausalgenesinplants AT lazaruselenaz qtgfinder2ageneralizedmachinelearningalgorithmforprioritizingqtlcausalgenesinplants AT rheeseungy qtgfinder2ageneralizedmachinelearningalgorithmforprioritizingqtlcausalgenesinplants |