Cargando…

Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS

Expression quantitative trait locus mapping has been widely used to study the genetic regulation of gene expression in Arabidopsis thaliana. As a result, a large amount of expression quantitative trait locus data has been generated for this model plant; however, only a few causal expression quantita...

Descripción completa

Detalles Bibliográficos
Autores principales: Hartanto, Margi, Sami, Asif Ahmed, de Ridder, Dick, Nijveen, Harm
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9635658/
https://www.ncbi.nlm.nih.gov/pubmed/36149290
http://dx.doi.org/10.1093/g3journal/jkac255
_version_ 1784824756830208000
author Hartanto, Margi
Sami, Asif Ahmed
de Ridder, Dick
Nijveen, Harm
author_facet Hartanto, Margi
Sami, Asif Ahmed
de Ridder, Dick
Nijveen, Harm
author_sort Hartanto, Margi
collection PubMed
description Expression quantitative trait locus mapping has been widely used to study the genetic regulation of gene expression in Arabidopsis thaliana. As a result, a large amount of expression quantitative trait locus data has been generated for this model plant; however, only a few causal expression quantitative trait locus genes have been identified, and experimental validation is costly and laborious. A prioritization method could help speed up the identification of causal expression quantitative trait locus genes. This study extends the machine-learning-based QTG-Finder2 method for prioritizing candidate causal genes in phenotype quantitative trait loci to be used for expression quantitative trait loci by adding gene structure, protein interaction, and gene expression. Independent validation shows that the new algorithm can prioritize 16 out of 25 potential expression quantitative trait locus causal genes within the top 20% rank. Several new features are important in prioritizing causal expression quantitative trait locus genes, including the number of protein–protein interactions, unique domains, and introns. Overall, this study provides a foundation for developing computational methods to prioritize candidate expression quantitative trait locus causal genes. The prediction of all genes is available in the AraQTL workbench (https://www.bioinformatics.nl/AraQTL/) to support the identification of gene expression regulators in Arabidopsis.
format Online
Article
Text
id pubmed-9635658
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-96356582022-11-07 Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS Hartanto, Margi Sami, Asif Ahmed de Ridder, Dick Nijveen, Harm G3 (Bethesda) Investigation Expression quantitative trait locus mapping has been widely used to study the genetic regulation of gene expression in Arabidopsis thaliana. As a result, a large amount of expression quantitative trait locus data has been generated for this model plant; however, only a few causal expression quantitative trait locus genes have been identified, and experimental validation is costly and laborious. A prioritization method could help speed up the identification of causal expression quantitative trait locus genes. This study extends the machine-learning-based QTG-Finder2 method for prioritizing candidate causal genes in phenotype quantitative trait loci to be used for expression quantitative trait loci by adding gene structure, protein interaction, and gene expression. Independent validation shows that the new algorithm can prioritize 16 out of 25 potential expression quantitative trait locus causal genes within the top 20% rank. Several new features are important in prioritizing causal expression quantitative trait locus genes, including the number of protein–protein interactions, unique domains, and introns. Overall, this study provides a foundation for developing computational methods to prioritize candidate expression quantitative trait locus causal genes. The prediction of all genes is available in the AraQTL workbench (https://www.bioinformatics.nl/AraQTL/) to support the identification of gene expression regulators in Arabidopsis. Oxford University Press 2022-09-23 /pmc/articles/PMC9635658/ /pubmed/36149290 http://dx.doi.org/10.1093/g3journal/jkac255 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigation
Hartanto, Margi
Sami, Asif Ahmed
de Ridder, Dick
Nijveen, Harm
Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS
title Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS
title_full Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS
title_fullStr Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS
title_full_unstemmed Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS
title_short Prioritizing candidate eQTL causal genes in Arabidopsis using RANDOM FORESTS
title_sort prioritizing candidate eqtl causal genes in arabidopsis using random forests
topic Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9635658/
https://www.ncbi.nlm.nih.gov/pubmed/36149290
http://dx.doi.org/10.1093/g3journal/jkac255
work_keys_str_mv AT hartantomargi prioritizingcandidateeqtlcausalgenesinarabidopsisusingrandomforests
AT samiasifahmed prioritizingcandidateeqtlcausalgenesinarabidopsisusingrandomforests
AT deridderdick prioritizingcandidateeqtlcausalgenesinarabidopsisusingrandomforests
AT nijveenharm prioritizingcandidateeqtlcausalgenesinarabidopsisusingrandomforests