Cargando…

SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction

Biological sequence data mining is hot spot in bioinformatics. A biological sequence can be regarded as a set of characters. Time series is similar to biological sequences in terms of both representation and mechanism. Therefore, in the article, biological sequences are represented with time series...

Descripción completa

Detalles Bibliográficos
Autores principales: Yan, Wu, Tan, Li, Meng-Shan, Li, Sheng, Sheng, Jun, Wang, Fu-an, Wu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10559882/
https://www.ncbi.nlm.nih.gov/pubmed/37810796
http://dx.doi.org/10.7717/peerj.16192
_version_ 1785117603132342272
author Yan, Wu
Tan, Li
Meng-Shan, Li
Sheng, Sheng
Jun, Wang
Fu-an, Wu
author_facet Yan, Wu
Tan, Li
Meng-Shan, Li
Sheng, Sheng
Jun, Wang
Fu-an, Wu
author_sort Yan, Wu
collection PubMed
description Biological sequence data mining is hot spot in bioinformatics. A biological sequence can be regarded as a set of characters. Time series is similar to biological sequences in terms of both representation and mechanism. Therefore, in the article, biological sequences are represented with time series to obtain biological time sequence (BTS). Hybrid ensemble learning framework (SaPt-CNN-LSTM-AR-EA) for BTS is proposed. Single-sequence and multi-sequence models are respectively constructed with self-adaption pre-training one-dimensional convolutional recurrent neural network and autoregressive fractional integrated moving average fused evolutionary algorithm. In DNA sequence experiments with six viruses, SaPt-CNN-LSTM-AR-EA realized the good overall prediction performance and the prediction accuracy and correlation respectively reached 1.7073 and 0.9186. SaPt-CNN-LSTM-AR-EA was compared with other five benchmark models so as to verify its effectiveness and stability. SaPt-CNN-LSTM-AR-EA increased the average accuracy by about 30%. The framework proposed in this article is significant in biology, biomedicine, and computer science, and can be widely applied in sequence splicing, computational biology, bioinformation, and other fields.
format Online
Article
Text
id pubmed-10559882
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-105598822023-10-08 SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction Yan, Wu Tan, Li Meng-Shan, Li Sheng, Sheng Jun, Wang Fu-an, Wu PeerJ Bioinformatics Biological sequence data mining is hot spot in bioinformatics. A biological sequence can be regarded as a set of characters. Time series is similar to biological sequences in terms of both representation and mechanism. Therefore, in the article, biological sequences are represented with time series to obtain biological time sequence (BTS). Hybrid ensemble learning framework (SaPt-CNN-LSTM-AR-EA) for BTS is proposed. Single-sequence and multi-sequence models are respectively constructed with self-adaption pre-training one-dimensional convolutional recurrent neural network and autoregressive fractional integrated moving average fused evolutionary algorithm. In DNA sequence experiments with six viruses, SaPt-CNN-LSTM-AR-EA realized the good overall prediction performance and the prediction accuracy and correlation respectively reached 1.7073 and 0.9186. SaPt-CNN-LSTM-AR-EA was compared with other five benchmark models so as to verify its effectiveness and stability. SaPt-CNN-LSTM-AR-EA increased the average accuracy by about 30%. The framework proposed in this article is significant in biology, biomedicine, and computer science, and can be widely applied in sequence splicing, computational biology, bioinformation, and other fields. PeerJ Inc. 2023-10-04 /pmc/articles/PMC10559882/ /pubmed/37810796 http://dx.doi.org/10.7717/peerj.16192 Text en © 2023 Yan et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Yan, Wu
Tan, Li
Meng-Shan, Li
Sheng, Sheng
Jun, Wang
Fu-an, Wu
SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction
title SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction
title_full SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction
title_fullStr SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction
title_full_unstemmed SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction
title_short SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction
title_sort sapt-cnn-lstm-ar-ea: a hybrid ensemble learning framework for time series-based multivariate dna sequence prediction
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10559882/
https://www.ncbi.nlm.nih.gov/pubmed/37810796
http://dx.doi.org/10.7717/peerj.16192
work_keys_str_mv AT yanwu saptcnnlstmareaahybridensemblelearningframeworkfortimeseriesbasedmultivariatednasequenceprediction
AT tanli saptcnnlstmareaahybridensemblelearningframeworkfortimeseriesbasedmultivariatednasequenceprediction
AT mengshanli saptcnnlstmareaahybridensemblelearningframeworkfortimeseriesbasedmultivariatednasequenceprediction
AT shengsheng saptcnnlstmareaahybridensemblelearningframeworkfortimeseriesbasedmultivariatednasequenceprediction
AT junwang saptcnnlstmareaahybridensemblelearningframeworkfortimeseriesbasedmultivariatednasequenceprediction
AT fuanwu saptcnnlstmareaahybridensemblelearningframeworkfortimeseriesbasedmultivariatednasequenceprediction