Cargando…
EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches
T-DNA activation-tagging technology is widely used to study rice gene functions. When T-DNA inserts into genome, the flanking gene expression may be altered using CaMV 35S enhancer, but the affected genes still need to be validated by biological experiment. We have developed the EAT-Rice platform to...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6505892/ https://www.ncbi.nlm.nih.gov/pubmed/31067213 http://dx.doi.org/10.1371/journal.pcbi.1006942 |
_version_ | 1783416817426890752 |
---|---|
author | Liao, Chi-Chou Chen, Liang-Jwu Lo, Shuen-Fang Chen, Chi-Wei Chu, Yen-Wei |
author_facet | Liao, Chi-Chou Chen, Liang-Jwu Lo, Shuen-Fang Chen, Chi-Wei Chu, Yen-Wei |
author_sort | Liao, Chi-Chou |
collection | PubMed |
description | T-DNA activation-tagging technology is widely used to study rice gene functions. When T-DNA inserts into genome, the flanking gene expression may be altered using CaMV 35S enhancer, but the affected genes still need to be validated by biological experiment. We have developed the EAT-Rice platform to predict the flanking gene expression of T-DNA insertion site in rice mutants. The three kinds of DNA sequences including UPS1K, DISTANCE, and MIDDLE were retrieved to encode and build a forecast model of two-layer machine learning. In the first-layer models, the features nucleotide context (N-gram), cis-regulatory elements (Motif), nucleotide physicochemical properties (NPC), and CG-island (CGI) were used to build SVM models by analysing the concealed information embedded within the three kinds of sequences. Logistic regression was used to estimate the probability of gene activation which as feature-encoding weighting within first-layer model. In the second-layer models, the NaiveBayesUpdateable algorithm was used to integrate these first layer-models, and the system performance was 88.33% on 5-fold cross-validation, and 79.17% on independent-testing finally. In the three kinds of sequences, the model constructed by Middle had the best contribution to the system for identifying the activated genes. The EAT-Rice system provided better performance and gene expression prediction at further distances when compared to the TRIM database. An online server based on EAT-rice is available at http://predictor.nchu.edu.tw/EAT-Rice. |
format | Online Article Text |
id | pubmed-6505892 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-65058922019-05-23 EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches Liao, Chi-Chou Chen, Liang-Jwu Lo, Shuen-Fang Chen, Chi-Wei Chu, Yen-Wei PLoS Comput Biol Research Article T-DNA activation-tagging technology is widely used to study rice gene functions. When T-DNA inserts into genome, the flanking gene expression may be altered using CaMV 35S enhancer, but the affected genes still need to be validated by biological experiment. We have developed the EAT-Rice platform to predict the flanking gene expression of T-DNA insertion site in rice mutants. The three kinds of DNA sequences including UPS1K, DISTANCE, and MIDDLE were retrieved to encode and build a forecast model of two-layer machine learning. In the first-layer models, the features nucleotide context (N-gram), cis-regulatory elements (Motif), nucleotide physicochemical properties (NPC), and CG-island (CGI) were used to build SVM models by analysing the concealed information embedded within the three kinds of sequences. Logistic regression was used to estimate the probability of gene activation which as feature-encoding weighting within first-layer model. In the second-layer models, the NaiveBayesUpdateable algorithm was used to integrate these first layer-models, and the system performance was 88.33% on 5-fold cross-validation, and 79.17% on independent-testing finally. In the three kinds of sequences, the model constructed by Middle had the best contribution to the system for identifying the activated genes. The EAT-Rice system provided better performance and gene expression prediction at further distances when compared to the TRIM database. An online server based on EAT-rice is available at http://predictor.nchu.edu.tw/EAT-Rice. Public Library of Science 2019-05-08 /pmc/articles/PMC6505892/ /pubmed/31067213 http://dx.doi.org/10.1371/journal.pcbi.1006942 Text en © 2019 Liao et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Liao, Chi-Chou Chen, Liang-Jwu Lo, Shuen-Fang Chen, Chi-Wei Chu, Yen-Wei EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches |
title | EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches |
title_full | EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches |
title_fullStr | EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches |
title_full_unstemmed | EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches |
title_short | EAT-Rice: A predictive model for flanking gene expression of T-DNA insertion activation-tagged rice mutants by machine learning approaches |
title_sort | eat-rice: a predictive model for flanking gene expression of t-dna insertion activation-tagged rice mutants by machine learning approaches |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6505892/ https://www.ncbi.nlm.nih.gov/pubmed/31067213 http://dx.doi.org/10.1371/journal.pcbi.1006942 |
work_keys_str_mv | AT liaochichou eatriceapredictivemodelforflankinggeneexpressionoftdnainsertionactivationtaggedricemutantsbymachinelearningapproaches AT chenliangjwu eatriceapredictivemodelforflankinggeneexpressionoftdnainsertionactivationtaggedricemutantsbymachinelearningapproaches AT loshuenfang eatriceapredictivemodelforflankinggeneexpressionoftdnainsertionactivationtaggedricemutantsbymachinelearningapproaches AT chenchiwei eatriceapredictivemodelforflankinggeneexpressionoftdnainsertionactivationtaggedricemutantsbymachinelearningapproaches AT chuyenwei eatriceapredictivemodelforflankinggeneexpressionoftdnainsertionactivationtaggedricemutantsbymachinelearningapproaches |