Cargando…

Evaluation of classification and forecasting methods on time series gene expression data

Time series gene expression data is widely used to study different dynamic biological processes. Although gene expression datasets share many of the characteristics of time series data from other domains, most of the analyses in this field do not fully leverage the time-ordered nature of the data an...

Descripción completa

Detalles Bibliográficos
Autores principales: Tripto, Nafis Irtiza, Kabir, Mohimenul, Bayzid, Md. Shamsuzzoha, Rahman, Atif
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7647064/
https://www.ncbi.nlm.nih.gov/pubmed/33156855
http://dx.doi.org/10.1371/journal.pone.0241686
_version_ 1783606880232275968
author Tripto, Nafis Irtiza
Kabir, Mohimenul
Bayzid, Md. Shamsuzzoha
Rahman, Atif
author_facet Tripto, Nafis Irtiza
Kabir, Mohimenul
Bayzid, Md. Shamsuzzoha
Rahman, Atif
author_sort Tripto, Nafis Irtiza
collection PubMed
description Time series gene expression data is widely used to study different dynamic biological processes. Although gene expression datasets share many of the characteristics of time series data from other domains, most of the analyses in this field do not fully leverage the time-ordered nature of the data and focus on clustering the genes based on their expression values. Other domains, such as financial stock and weather prediction, utilize time series data for forecasting purposes. Moreover, many studies have been conducted to classify generic time series data based on trend, seasonality, and other patterns. Therefore, an assessment of these approaches on gene expression data would be of great interest to evaluate their adequacy in this domain. Here, we perform a comprehensive evaluation of different traditional unsupervised and supervised machine learning approaches as well as deep learning based techniques for time series gene expression classification and forecasting on five real datasets. In addition, we propose deep learning based methods for both classification and forecasting, and compare their performances with the state-of-the-art methods. We find that deep learning based methods generally outperform traditional approaches for time series classification. Experiments also suggest that supervised classification on gene expression is more effective than clustering when labels are available. In time series gene expression forecasting, we observe that an autoregressive statistical approach has the best performance for short term forecasting, whereas deep learning based methods are better suited for long term forecasting.
format Online
Article
Text
id pubmed-7647064
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-76470642020-11-16 Evaluation of classification and forecasting methods on time series gene expression data Tripto, Nafis Irtiza Kabir, Mohimenul Bayzid, Md. Shamsuzzoha Rahman, Atif PLoS One Research Article Time series gene expression data is widely used to study different dynamic biological processes. Although gene expression datasets share many of the characteristics of time series data from other domains, most of the analyses in this field do not fully leverage the time-ordered nature of the data and focus on clustering the genes based on their expression values. Other domains, such as financial stock and weather prediction, utilize time series data for forecasting purposes. Moreover, many studies have been conducted to classify generic time series data based on trend, seasonality, and other patterns. Therefore, an assessment of these approaches on gene expression data would be of great interest to evaluate their adequacy in this domain. Here, we perform a comprehensive evaluation of different traditional unsupervised and supervised machine learning approaches as well as deep learning based techniques for time series gene expression classification and forecasting on five real datasets. In addition, we propose deep learning based methods for both classification and forecasting, and compare their performances with the state-of-the-art methods. We find that deep learning based methods generally outperform traditional approaches for time series classification. Experiments also suggest that supervised classification on gene expression is more effective than clustering when labels are available. In time series gene expression forecasting, we observe that an autoregressive statistical approach has the best performance for short term forecasting, whereas deep learning based methods are better suited for long term forecasting. Public Library of Science 2020-11-06 /pmc/articles/PMC7647064/ /pubmed/33156855 http://dx.doi.org/10.1371/journal.pone.0241686 Text en © 2020 Tripto et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Tripto, Nafis Irtiza
Kabir, Mohimenul
Bayzid, Md. Shamsuzzoha
Rahman, Atif
Evaluation of classification and forecasting methods on time series gene expression data
title Evaluation of classification and forecasting methods on time series gene expression data
title_full Evaluation of classification and forecasting methods on time series gene expression data
title_fullStr Evaluation of classification and forecasting methods on time series gene expression data
title_full_unstemmed Evaluation of classification and forecasting methods on time series gene expression data
title_short Evaluation of classification and forecasting methods on time series gene expression data
title_sort evaluation of classification and forecasting methods on time series gene expression data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7647064/
https://www.ncbi.nlm.nih.gov/pubmed/33156855
http://dx.doi.org/10.1371/journal.pone.0241686
work_keys_str_mv AT triptonafisirtiza evaluationofclassificationandforecastingmethodsontimeseriesgeneexpressiondata
AT kabirmohimenul evaluationofclassificationandforecastingmethodsontimeseriesgeneexpressiondata
AT bayzidmdshamsuzzoha evaluationofclassificationandforecastingmethodsontimeseriesgeneexpressiondata
AT rahmanatif evaluationofclassificationandforecastingmethodsontimeseriesgeneexpressiondata