Cargando…
SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction
Biological sequence data mining is hot spot in bioinformatics. A biological sequence can be regarded as a set of characters. Time series is similar to biological sequences in terms of both representation and mechanism. Therefore, in the article, biological sequences are represented with time series...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10559882/ https://www.ncbi.nlm.nih.gov/pubmed/37810796 http://dx.doi.org/10.7717/peerj.16192 |
_version_ | 1785117603132342272 |
---|---|
author | Yan, Wu Tan, Li Meng-Shan, Li Sheng, Sheng Jun, Wang Fu-an, Wu |
author_facet | Yan, Wu Tan, Li Meng-Shan, Li Sheng, Sheng Jun, Wang Fu-an, Wu |
author_sort | Yan, Wu |
collection | PubMed |
description | Biological sequence data mining is hot spot in bioinformatics. A biological sequence can be regarded as a set of characters. Time series is similar to biological sequences in terms of both representation and mechanism. Therefore, in the article, biological sequences are represented with time series to obtain biological time sequence (BTS). Hybrid ensemble learning framework (SaPt-CNN-LSTM-AR-EA) for BTS is proposed. Single-sequence and multi-sequence models are respectively constructed with self-adaption pre-training one-dimensional convolutional recurrent neural network and autoregressive fractional integrated moving average fused evolutionary algorithm. In DNA sequence experiments with six viruses, SaPt-CNN-LSTM-AR-EA realized the good overall prediction performance and the prediction accuracy and correlation respectively reached 1.7073 and 0.9186. SaPt-CNN-LSTM-AR-EA was compared with other five benchmark models so as to verify its effectiveness and stability. SaPt-CNN-LSTM-AR-EA increased the average accuracy by about 30%. The framework proposed in this article is significant in biology, biomedicine, and computer science, and can be widely applied in sequence splicing, computational biology, bioinformation, and other fields. |
format | Online Article Text |
id | pubmed-10559882 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-105598822023-10-08 SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction Yan, Wu Tan, Li Meng-Shan, Li Sheng, Sheng Jun, Wang Fu-an, Wu PeerJ Bioinformatics Biological sequence data mining is hot spot in bioinformatics. A biological sequence can be regarded as a set of characters. Time series is similar to biological sequences in terms of both representation and mechanism. Therefore, in the article, biological sequences are represented with time series to obtain biological time sequence (BTS). Hybrid ensemble learning framework (SaPt-CNN-LSTM-AR-EA) for BTS is proposed. Single-sequence and multi-sequence models are respectively constructed with self-adaption pre-training one-dimensional convolutional recurrent neural network and autoregressive fractional integrated moving average fused evolutionary algorithm. In DNA sequence experiments with six viruses, SaPt-CNN-LSTM-AR-EA realized the good overall prediction performance and the prediction accuracy and correlation respectively reached 1.7073 and 0.9186. SaPt-CNN-LSTM-AR-EA was compared with other five benchmark models so as to verify its effectiveness and stability. SaPt-CNN-LSTM-AR-EA increased the average accuracy by about 30%. The framework proposed in this article is significant in biology, biomedicine, and computer science, and can be widely applied in sequence splicing, computational biology, bioinformation, and other fields. PeerJ Inc. 2023-10-04 /pmc/articles/PMC10559882/ /pubmed/37810796 http://dx.doi.org/10.7717/peerj.16192 Text en © 2023 Yan et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Yan, Wu Tan, Li Meng-Shan, Li Sheng, Sheng Jun, Wang Fu-an, Wu SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction |
title | SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction |
title_full | SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction |
title_fullStr | SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction |
title_full_unstemmed | SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction |
title_short | SaPt-CNN-LSTM-AR-EA: a hybrid ensemble learning framework for time series-based multivariate DNA sequence prediction |
title_sort | sapt-cnn-lstm-ar-ea: a hybrid ensemble learning framework for time series-based multivariate dna sequence prediction |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10559882/ https://www.ncbi.nlm.nih.gov/pubmed/37810796 http://dx.doi.org/10.7717/peerj.16192 |
work_keys_str_mv | AT yanwu saptcnnlstmareaahybridensemblelearningframeworkfortimeseriesbasedmultivariatednasequenceprediction AT tanli saptcnnlstmareaahybridensemblelearningframeworkfortimeseriesbasedmultivariatednasequenceprediction AT mengshanli saptcnnlstmareaahybridensemblelearningframeworkfortimeseriesbasedmultivariatednasequenceprediction AT shengsheng saptcnnlstmareaahybridensemblelearningframeworkfortimeseriesbasedmultivariatednasequenceprediction AT junwang saptcnnlstmareaahybridensemblelearningframeworkfortimeseriesbasedmultivariatednasequenceprediction AT fuanwu saptcnnlstmareaahybridensemblelearningframeworkfortimeseriesbasedmultivariatednasequenceprediction |