Cargando…
Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS
Expression quantitative trait locus mapping has been widely used to study the genetic regulation of gene expression in Arabidopsis thaliana. As a result, a large amount of expression quantitative trait locus data has been generated for this model plant; however, only a few causal expression quantita...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9635658/ https://www.ncbi.nlm.nih.gov/pubmed/36149290 http://dx.doi.org/10.1093/g3journal/jkac255 |
_version_ | 1784824756830208000 |
---|---|
author | Hartanto, Margi Sami, Asif Ahmed de Ridder, Dick Nijveen, Harm |
author_facet | Hartanto, Margi Sami, Asif Ahmed de Ridder, Dick Nijveen, Harm |
author_sort | Hartanto, Margi |
collection | PubMed |
description | Expression quantitative trait locus mapping has been widely used to study the genetic regulation of gene expression in Arabidopsis thaliana. As a result, a large amount of expression quantitative trait locus data has been generated for this model plant; however, only a few causal expression quantitative trait locus genes have been identified, and experimental validation is costly and laborious. A prioritization method could help speed up the identification of causal expression quantitative trait locus genes. This study extends the machine-learning-based QTG-Finder2 method for prioritizing candidate causal genes in phenotype quantitative trait loci to be used for expression quantitative trait loci by adding gene structure, protein interaction, and gene expression. Independent validation shows that the new algorithm can prioritize 16 out of 25 potential expression quantitative trait locus causal genes within the top 20% rank. Several new features are important in prioritizing causal expression quantitative trait locus genes, including the number of protein–protein interactions, unique domains, and introns. Overall, this study provides a foundation for developing computational methods to prioritize candidate expression quantitative trait locus causal genes. The prediction of all genes is available in the AraQTL workbench (https://www.bioinformatics.nl/AraQTL/) to support the identification of gene expression regulators in Arabidopsis. |
format | Online Article Text |
id | pubmed-9635658 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-96356582022-11-07 Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS Hartanto, Margi Sami, Asif Ahmed de Ridder, Dick Nijveen, Harm G3 (Bethesda) Investigation Expression quantitative trait locus mapping has been widely used to study the genetic regulation of gene expression in Arabidopsis thaliana. As a result, a large amount of expression quantitative trait locus data has been generated for this model plant; however, only a few causal expression quantitative trait locus genes have been identified, and experimental validation is costly and laborious. A prioritization method could help speed up the identification of causal expression quantitative trait locus genes. This study extends the machine-learning-based QTG-Finder2 method for prioritizing candidate causal genes in phenotype quantitative trait loci to be used for expression quantitative trait loci by adding gene structure, protein interaction, and gene expression. Independent validation shows that the new algorithm can prioritize 16 out of 25 potential expression quantitative trait locus causal genes within the top 20% rank. Several new features are important in prioritizing causal expression quantitative trait locus genes, including the number of protein–protein interactions, unique domains, and introns. Overall, this study provides a foundation for developing computational methods to prioritize candidate expression quantitative trait locus causal genes. The prediction of all genes is available in the AraQTL workbench (https://www.bioinformatics.nl/AraQTL/) to support the identification of gene expression regulators in Arabidopsis. Oxford University Press 2022-09-23 /pmc/articles/PMC9635658/ /pubmed/36149290 http://dx.doi.org/10.1093/g3journal/jkac255 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Investigation Hartanto, Margi Sami, Asif Ahmed de Ridder, Dick Nijveen, Harm Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS |
title | Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS |
title_full | Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS |
title_fullStr | Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS |
title_full_unstemmed | Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS |
title_short | Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS |
title_sort | prioritizing candidate eqtl causal genes in arabidopsis using random forests |
topic | Investigation |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9635658/ https://www.ncbi.nlm.nih.gov/pubmed/36149290 http://dx.doi.org/10.1093/g3journal/jkac255 |
work_keys_str_mv | AT hartantomargi prioritizingcandidateeqtlcausalgenesinarabidopsisusingrandomforests AT samiasifahmed prioritizingcandidateeqtlcausalgenesinarabidopsisusingrandomforests AT deridderdick prioritizingcandidateeqtlcausalgenesinarabidopsisusingrandomforests AT nijveenharm prioritizingcandidateeqtlcausalgenesinarabidopsisusingrandomforests |