Cargando…

Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants

To change the expression of the flanking genes by inserting T-DNA into the genome is commonly used in rice functional gene research. However, whether the expression of a gene of interest is enhanced must be validated experimentally. Consequently, to improve the efficiency of screening activated gene...

Descripción completa

Detalles Bibliográficos
Autores principales: Chien, Ching-Hsuan, Huang, Lan-Ying, Lo, Shuen-Fang, Chen, Liang-Jwu, Liao, Chi-Chou, Chen, Jia-Jyun, Chu, Yen-Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8718795/
https://www.ncbi.nlm.nih.gov/pubmed/34976025
http://dx.doi.org/10.3389/fgene.2021.798107
_version_ 1784624803786784768
author Chien, Ching-Hsuan
Huang, Lan-Ying
Lo, Shuen-Fang
Chen, Liang-Jwu
Liao, Chi-Chou
Chen, Jia-Jyun
Chu, Yen-Wei
author_facet Chien, Ching-Hsuan
Huang, Lan-Ying
Lo, Shuen-Fang
Chen, Liang-Jwu
Liao, Chi-Chou
Chen, Jia-Jyun
Chu, Yen-Wei
author_sort Chien, Ching-Hsuan
collection PubMed
description To change the expression of the flanking genes by inserting T-DNA into the genome is commonly used in rice functional gene research. However, whether the expression of a gene of interest is enhanced must be validated experimentally. Consequently, to improve the efficiency of screening activated genes, we established a model to predict gene expression in T-DNA mutants through machine learning methods. We gathered experimental datasets consisting of gene expression data in T-DNA mutants and captured the PROMOTER and MIDDLE sequences for encoding. In first-layer models, support vector machine (SVM) models were constructed with nine features consisting of information about biological function and local and global sequences. Feature encoding based on the PROMOTER sequence was weighted by logistic regression. The second-layer models integrated 16 first-layer models with minimum redundancy maximum relevance (mRMR) feature selection and the LADTree algorithm, which were selected from nine feature selection methods and 65 classified methods, respectively. The accuracy of the final two-layer machine learning model, referred to as TIMgo, was 99.3% based on fivefold cross-validation, and 85.6% based on independent testing. We discovered that the information within the local sequence had a greater contribution than the global sequence with respect to classification. TIMgo had a good predictive ability for target genes within 20 kb from the 35S enhancer. Based on the analysis of significant sequences, the G-box regulatory sequence may also play an important role in the activation mechanism of the 35S enhancer.
format Online
Article
Text
id pubmed-8718795
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-87187952022-01-01 Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants Chien, Ching-Hsuan Huang, Lan-Ying Lo, Shuen-Fang Chen, Liang-Jwu Liao, Chi-Chou Chen, Jia-Jyun Chu, Yen-Wei Front Genet Genetics To change the expression of the flanking genes by inserting T-DNA into the genome is commonly used in rice functional gene research. However, whether the expression of a gene of interest is enhanced must be validated experimentally. Consequently, to improve the efficiency of screening activated genes, we established a model to predict gene expression in T-DNA mutants through machine learning methods. We gathered experimental datasets consisting of gene expression data in T-DNA mutants and captured the PROMOTER and MIDDLE sequences for encoding. In first-layer models, support vector machine (SVM) models were constructed with nine features consisting of information about biological function and local and global sequences. Feature encoding based on the PROMOTER sequence was weighted by logistic regression. The second-layer models integrated 16 first-layer models with minimum redundancy maximum relevance (mRMR) feature selection and the LADTree algorithm, which were selected from nine feature selection methods and 65 classified methods, respectively. The accuracy of the final two-layer machine learning model, referred to as TIMgo, was 99.3% based on fivefold cross-validation, and 85.6% based on independent testing. We discovered that the information within the local sequence had a greater contribution than the global sequence with respect to classification. TIMgo had a good predictive ability for target genes within 20 kb from the 35S enhancer. Based on the analysis of significant sequences, the G-box regulatory sequence may also play an important role in the activation mechanism of the 35S enhancer. Frontiers Media S.A. 2021-12-17 /pmc/articles/PMC8718795/ /pubmed/34976025 http://dx.doi.org/10.3389/fgene.2021.798107 Text en Copyright © 2021 Chien, Huang, Lo, Chen, Liao, Chen and Chu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Chien, Ching-Hsuan
Huang, Lan-Ying
Lo, Shuen-Fang
Chen, Liang-Jwu
Liao, Chi-Chou
Chen, Jia-Jyun
Chu, Yen-Wei
Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants
title Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants
title_full Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants
title_fullStr Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants
title_full_unstemmed Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants
title_short Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants
title_sort using machine learning approaches to predict target gene expression in rice t-dna insertional mutants
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8718795/
https://www.ncbi.nlm.nih.gov/pubmed/34976025
http://dx.doi.org/10.3389/fgene.2021.798107
work_keys_str_mv AT chienchinghsuan usingmachinelearningapproachestopredicttargetgeneexpressioninricetdnainsertionalmutants
AT huanglanying usingmachinelearningapproachestopredicttargetgeneexpressioninricetdnainsertionalmutants
AT loshuenfang usingmachinelearningapproachestopredicttargetgeneexpressioninricetdnainsertionalmutants
AT chenliangjwu usingmachinelearningapproachestopredicttargetgeneexpressioninricetdnainsertionalmutants
AT liaochichou usingmachinelearningapproachestopredicttargetgeneexpressioninricetdnainsertionalmutants
AT chenjiajyun usingmachinelearningapproachestopredicttargetgeneexpressioninricetdnainsertionalmutants
AT chuyenwei usingmachinelearningapproachestopredicttargetgeneexpressioninricetdnainsertionalmutants