Cargando…

StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis

BACKGROUND: Recently, a number of studies have been conducted to investigate how plants respond to stress at the cellular molecular level by measuring gene expression profiles over time. As a result, a set of time-series gene expression data for the stress response are available in databases. With t...

Descripción completa

Detalles Bibliográficos
Autores principales: Kang, Dongwon, Ahn, Hongryul, Lee, Sangseon, Lee, Chai-Jin, Hur, Jihye, Jung, Woosuk, Kim, Sun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923958/
https://www.ncbi.nlm.nih.gov/pubmed/31856731
http://dx.doi.org/10.1186/s12864-019-6283-z
_version_ 1783481632977584128
author Kang, Dongwon
Ahn, Hongryul
Lee, Sangseon
Lee, Chai-Jin
Hur, Jihye
Jung, Woosuk
Kim, Sun
author_facet Kang, Dongwon
Ahn, Hongryul
Lee, Sangseon
Lee, Chai-Jin
Hur, Jihye
Jung, Woosuk
Kim, Sun
author_sort Kang, Dongwon
collection PubMed
description BACKGROUND: Recently, a number of studies have been conducted to investigate how plants respond to stress at the cellular molecular level by measuring gene expression profiles over time. As a result, a set of time-series gene expression data for the stress response are available in databases. With the data, an integrated analysis of multiple stresses is possible, which identifies stress-responsive genes with higher specificity because considering multiple stress can capture the effect of interference between stresses. To analyze such data, a machine learning model needs to be built. RESULTS: In this study, we developed StressGenePred, a neural network-based machine learning method, to integrate time-series transcriptome data of multiple stress types. StressGenePred is designed to detect single stress-specific biomarker genes by using a simple feature embedding method, a twin neural network model, and Confident Multiple Choice Learning (CMCL) loss. The twin neural network model consists of a biomarker gene discovery and a stress type prediction model that share the same logical layer to reduce training complexity. The CMCL loss is used to make the twin model select biomarker genes that respond specifically to a single stress. In experiments using Arabidopsis gene expression data for four major environmental stresses, such as heat, cold, salt, and drought, StressGenePred classified the types of stress more accurately than the limma feature embedding method and the support vector machine and random forest classification methods. In addition, StressGenePred discovered known stress-related genes with higher specificity than the Fisher method. CONCLUSIONS: StressGenePred is a machine learning method for identifying stress-related genes and predicting stress types for an integrated analysis of multiple stress time-series transcriptome data. This method can be used to other phenotype-gene associated studies.
format Online
Article
Text
id pubmed-6923958
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69239582019-12-30 StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis Kang, Dongwon Ahn, Hongryul Lee, Sangseon Lee, Chai-Jin Hur, Jihye Jung, Woosuk Kim, Sun BMC Genomics Research BACKGROUND: Recently, a number of studies have been conducted to investigate how plants respond to stress at the cellular molecular level by measuring gene expression profiles over time. As a result, a set of time-series gene expression data for the stress response are available in databases. With the data, an integrated analysis of multiple stresses is possible, which identifies stress-responsive genes with higher specificity because considering multiple stress can capture the effect of interference between stresses. To analyze such data, a machine learning model needs to be built. RESULTS: In this study, we developed StressGenePred, a neural network-based machine learning method, to integrate time-series transcriptome data of multiple stress types. StressGenePred is designed to detect single stress-specific biomarker genes by using a simple feature embedding method, a twin neural network model, and Confident Multiple Choice Learning (CMCL) loss. The twin neural network model consists of a biomarker gene discovery and a stress type prediction model that share the same logical layer to reduce training complexity. The CMCL loss is used to make the twin model select biomarker genes that respond specifically to a single stress. In experiments using Arabidopsis gene expression data for four major environmental stresses, such as heat, cold, salt, and drought, StressGenePred classified the types of stress more accurately than the limma feature embedding method and the support vector machine and random forest classification methods. In addition, StressGenePred discovered known stress-related genes with higher specificity than the Fisher method. CONCLUSIONS: StressGenePred is a machine learning method for identifying stress-related genes and predicting stress types for an integrated analysis of multiple stress time-series transcriptome data. This method can be used to other phenotype-gene associated studies. BioMed Central 2019-12-20 /pmc/articles/PMC6923958/ /pubmed/31856731 http://dx.doi.org/10.1186/s12864-019-6283-z Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Kang, Dongwon
Ahn, Hongryul
Lee, Sangseon
Lee, Chai-Jin
Hur, Jihye
Jung, Woosuk
Kim, Sun
StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis
title StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis
title_full StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis
title_fullStr StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis
title_full_unstemmed StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis
title_short StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis
title_sort stressgenepred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6923958/
https://www.ncbi.nlm.nih.gov/pubmed/31856731
http://dx.doi.org/10.1186/s12864-019-6283-z
work_keys_str_mv AT kangdongwon stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis
AT ahnhongryul stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis
AT leesangseon stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis
AT leechaijin stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis
AT hurjihye stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis
AT jungwoosuk stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis
AT kimsun stressgenepredatwinpredictionmodelarchitectureforclassifyingthestresstypesofsamplesanddiscoveringstressrelatedgenesinarabidopsis