Cargando…
Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants
To change the expression of the flanking genes by inserting T-DNA into the genome is commonly used in rice functional gene research. However, whether the expression of a gene of interest is enhanced must be validated experimentally. Consequently, to improve the efficiency of screening activated gene...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8718795/ https://www.ncbi.nlm.nih.gov/pubmed/34976025 http://dx.doi.org/10.3389/fgene.2021.798107 |
_version_ | 1784624803786784768 |
---|---|
author | Chien, Ching-Hsuan Huang, Lan-Ying Lo, Shuen-Fang Chen, Liang-Jwu Liao, Chi-Chou Chen, Jia-Jyun Chu, Yen-Wei |
author_facet | Chien, Ching-Hsuan Huang, Lan-Ying Lo, Shuen-Fang Chen, Liang-Jwu Liao, Chi-Chou Chen, Jia-Jyun Chu, Yen-Wei |
author_sort | Chien, Ching-Hsuan |
collection | PubMed |
description | To change the expression of the flanking genes by inserting T-DNA into the genome is commonly used in rice functional gene research. However, whether the expression of a gene of interest is enhanced must be validated experimentally. Consequently, to improve the efficiency of screening activated genes, we established a model to predict gene expression in T-DNA mutants through machine learning methods. We gathered experimental datasets consisting of gene expression data in T-DNA mutants and captured the PROMOTER and MIDDLE sequences for encoding. In first-layer models, support vector machine (SVM) models were constructed with nine features consisting of information about biological function and local and global sequences. Feature encoding based on the PROMOTER sequence was weighted by logistic regression. The second-layer models integrated 16 first-layer models with minimum redundancy maximum relevance (mRMR) feature selection and the LADTree algorithm, which were selected from nine feature selection methods and 65 classified methods, respectively. The accuracy of the final two-layer machine learning model, referred to as TIMgo, was 99.3% based on fivefold cross-validation, and 85.6% based on independent testing. We discovered that the information within the local sequence had a greater contribution than the global sequence with respect to classification. TIMgo had a good predictive ability for target genes within 20 kb from the 35S enhancer. Based on the analysis of significant sequences, the G-box regulatory sequence may also play an important role in the activation mechanism of the 35S enhancer. |
format | Online Article Text |
id | pubmed-8718795 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-87187952022-01-01 Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants Chien, Ching-Hsuan Huang, Lan-Ying Lo, Shuen-Fang Chen, Liang-Jwu Liao, Chi-Chou Chen, Jia-Jyun Chu, Yen-Wei Front Genet Genetics To change the expression of the flanking genes by inserting T-DNA into the genome is commonly used in rice functional gene research. However, whether the expression of a gene of interest is enhanced must be validated experimentally. Consequently, to improve the efficiency of screening activated genes, we established a model to predict gene expression in T-DNA mutants through machine learning methods. We gathered experimental datasets consisting of gene expression data in T-DNA mutants and captured the PROMOTER and MIDDLE sequences for encoding. In first-layer models, support vector machine (SVM) models were constructed with nine features consisting of information about biological function and local and global sequences. Feature encoding based on the PROMOTER sequence was weighted by logistic regression. The second-layer models integrated 16 first-layer models with minimum redundancy maximum relevance (mRMR) feature selection and the LADTree algorithm, which were selected from nine feature selection methods and 65 classified methods, respectively. The accuracy of the final two-layer machine learning model, referred to as TIMgo, was 99.3% based on fivefold cross-validation, and 85.6% based on independent testing. We discovered that the information within the local sequence had a greater contribution than the global sequence with respect to classification. TIMgo had a good predictive ability for target genes within 20 kb from the 35S enhancer. Based on the analysis of significant sequences, the G-box regulatory sequence may also play an important role in the activation mechanism of the 35S enhancer. Frontiers Media S.A. 2021-12-17 /pmc/articles/PMC8718795/ /pubmed/34976025 http://dx.doi.org/10.3389/fgene.2021.798107 Text en Copyright © 2021 Chien, Huang, Lo, Chen, Liao, Chen and Chu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Chien, Ching-Hsuan Huang, Lan-Ying Lo, Shuen-Fang Chen, Liang-Jwu Liao, Chi-Chou Chen, Jia-Jyun Chu, Yen-Wei Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants |
title | Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants |
title_full | Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants |
title_fullStr | Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants |
title_full_unstemmed | Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants |
title_short | Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants |
title_sort | using machine learning approaches to predict target gene expression in rice t-dna insertional mutants |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8718795/ https://www.ncbi.nlm.nih.gov/pubmed/34976025 http://dx.doi.org/10.3389/fgene.2021.798107 |
work_keys_str_mv | AT chienchinghsuan usingmachinelearningapproachestopredicttargetgeneexpressioninricetdnainsertionalmutants AT huanglanying usingmachinelearningapproachestopredicttargetgeneexpressioninricetdnainsertionalmutants AT loshuenfang usingmachinelearningapproachestopredicttargetgeneexpressioninricetdnainsertionalmutants AT chenliangjwu usingmachinelearningapproachestopredicttargetgeneexpressioninricetdnainsertionalmutants AT liaochichou usingmachinelearningapproachestopredicttargetgeneexpressioninricetdnainsertionalmutants AT chenjiajyun usingmachinelearningapproachestopredicttargetgeneexpressioninricetdnainsertionalmutants AT chuyenwei usingmachinelearningapproachestopredicttargetgeneexpressioninricetdnainsertionalmutants |