Cargando…

EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches

T-DNA activation-tagging technology is widely used to study rice gene functions. When T-DNA inserts into genome, the flanking gene expression may be altered using CaMV 35S enhancer, but the affected genes still need to be validated by biological experiment. We have developed the EAT-Rice platform to...

Descripción completa

Detalles Bibliográficos
Autores principales: Liao, Chi-Chou, Chen, Liang-Jwu, Lo, Shuen-Fang, Chen, Chi-Wei, Chu, Yen-Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6505892/
https://www.ncbi.nlm.nih.gov/pubmed/31067213
http://dx.doi.org/10.1371/journal.pcbi.1006942
_version_ 1783416817426890752
author Liao, Chi-Chou
Chen, Liang-Jwu
Lo, Shuen-Fang
Chen, Chi-Wei
Chu, Yen-Wei
author_facet Liao, Chi-Chou
Chen, Liang-Jwu
Lo, Shuen-Fang
Chen, Chi-Wei
Chu, Yen-Wei
author_sort Liao, Chi-Chou
collection PubMed
description T-DNA activation-tagging technology is widely used to study rice gene functions. When T-DNA inserts into genome, the flanking gene expression may be altered using CaMV 35S enhancer, but the affected genes still need to be validated by biological experiment. We have developed the EAT-Rice platform to predict the flanking gene expression of T-DNA insertion site in rice mutants. The three kinds of DNA sequences including UPS1K, DISTANCE, and MIDDLE were retrieved to encode and build a forecast model of two-layer machine learning. In the first-layer models, the features nucleotide context (N-gram), cis-regulatory elements (Motif), nucleotide physicochemical properties (NPC), and CG-island (CGI) were used to build SVM models by analysing the concealed information embedded within the three kinds of sequences. Logistic regression was used to estimate the probability of gene activation which as feature-encoding weighting within first-layer model. In the second-layer models, the NaiveBayesUpdateable algorithm was used to integrate these first layer-models, and the system performance was 88.33% on 5-fold cross-validation, and 79.17% on independent-testing finally. In the three kinds of sequences, the model constructed by Middle had the best contribution to the system for identifying the activated genes. The EAT-Rice system provided better performance and gene expression prediction at further distances when compared to the TRIM database. An online server based on EAT-rice is available at http://predictor.nchu.edu.tw/EAT-Rice.
format Online
Article
Text
id pubmed-6505892
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-65058922019-05-23 EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches Liao, Chi-Chou Chen, Liang-Jwu Lo, Shuen-Fang Chen, Chi-Wei Chu, Yen-Wei PLoS Comput Biol Research Article T-DNA activation-tagging technology is widely used to study rice gene functions. When T-DNA inserts into genome, the flanking gene expression may be altered using CaMV 35S enhancer, but the affected genes still need to be validated by biological experiment. We have developed the EAT-Rice platform to predict the flanking gene expression of T-DNA insertion site in rice mutants. The three kinds of DNA sequences including UPS1K, DISTANCE, and MIDDLE were retrieved to encode and build a forecast model of two-layer machine learning. In the first-layer models, the features nucleotide context (N-gram), cis-regulatory elements (Motif), nucleotide physicochemical properties (NPC), and CG-island (CGI) were used to build SVM models by analysing the concealed information embedded within the three kinds of sequences. Logistic regression was used to estimate the probability of gene activation which as feature-encoding weighting within first-layer model. In the second-layer models, the NaiveBayesUpdateable algorithm was used to integrate these first layer-models, and the system performance was 88.33% on 5-fold cross-validation, and 79.17% on independent-testing finally. In the three kinds of sequences, the model constructed by Middle had the best contribution to the system for identifying the activated genes. The EAT-Rice system provided better performance and gene expression prediction at further distances when compared to the TRIM database. An online server based on EAT-rice is available at http://predictor.nchu.edu.tw/EAT-Rice. Public Library of Science 2019-05-08 /pmc/articles/PMC6505892/ /pubmed/31067213 http://dx.doi.org/10.1371/journal.pcbi.1006942 Text en © 2019 Liao et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Liao, Chi-Chou
Chen, Liang-Jwu
Lo, Shuen-Fang
Chen, Chi-Wei
Chu, Yen-Wei
EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches
title EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches
title_full EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches
title_fullStr EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches
title_full_unstemmed EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches
title_short EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches
title_sort eat-rice: a predictive model for flanking gene expression of t-dna insertion activation-tagged rice mutants by machine learning approaches
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6505892/
https://www.ncbi.nlm.nih.gov/pubmed/31067213
http://dx.doi.org/10.1371/journal.pcbi.1006942
work_keys_str_mv AT liaochichou eatriceapredictivemodelforflankinggeneexpressionoftdnainsertionactivationtaggedricemutantsbymachinelearningapproaches
AT chenliangjwu eatriceapredictivemodelforflankinggeneexpressionoftdnainsertionactivationtaggedricemutantsbymachinelearningapproaches
AT loshuenfang eatriceapredictivemodelforflankinggeneexpressionoftdnainsertionactivationtaggedricemutantsbymachinelearningapproaches
AT chenchiwei eatriceapredictivemodelforflankinggeneexpressionoftdnainsertionactivationtaggedricemutantsbymachinelearningapproaches
AT chuyenwei eatriceapredictivemodelforflankinggeneexpressionoftdnainsertionactivationtaggedricemutantsbymachinelearningapproaches