Cargando…

Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks

BACKGROUND AND OBJECTIVE: The development of machine learning-based models that can be used for the prediction of severe diseases has been one of the main concerns of the scientific community. The current study seeks to expand a highly sophisticated tool, the Convolutional Neural Networks, making it...

Descripción completa

Detalles Bibliográficos
Autores principales: Zompola, Anastasia, Korfiati, Aigli, Theofilatos, Konstantinos, Mavroudi, Seferina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10658203/
https://www.ncbi.nlm.nih.gov/pubmed/38027840
http://dx.doi.org/10.1016/j.heliyon.2023.e21165
_version_ 1785137367139483648
author Zompola, Anastasia
Korfiati, Aigli
Theofilatos, Konstantinos
Mavroudi, Seferina
author_facet Zompola, Anastasia
Korfiati, Aigli
Theofilatos, Konstantinos
Mavroudi, Seferina
author_sort Zompola, Anastasia
collection PubMed
description BACKGROUND AND OBJECTIVE: The development of machine learning-based models that can be used for the prediction of severe diseases has been one of the main concerns of the scientific community. The current study seeks to expand a highly sophisticated tool, the Convolutional Neural Networks, making it applicable in multidimensional omics data classification problems and testing the newly introduced method on publicly available transcriptomics and proteomics data. METHODS: In this study, we introduce Omics-CNN, a Convolutional Neural Network-based pipeline, which couples Convolutional Neural Networks with dimensionality reduction, preprocessing, clustering, and explainability techniques to make them suitable to build highly accurate and interpretable classification models from high-throughput omics data. The developed tool has the potential to classify patients depending on the expression of genetic and clinical factors and identify features that can act as diagnostic biomarkers. Regarding dimensionality reduction, univariate and multivariate techniques were explored and compared. Gradient Weighted Class Activation Mapping analysis was performed to determine the most important features in the classification of the samples after training the model. RESULTS: The newly introduced pipeline was applied to one transcriptomics and one proteomics dataset for the identification of diagnostic models and biosignatures for Ischemic Stroke (IS) and COVID-19 infection, reporting highly accurate biosignatures with accuracies of 96 % and 95.41 %, respectively. Meanwhile, classification models based solely on a small part of attributes provided lower predictive accuracy, but identified compact transcript biosignature (KRT15, VPRBP, TNFRSF4, GORASP2) for Ischemic Stroke and protein biosignature (ADGRB3, VNN2, AGER, CIAPIN1) for Covid-19 infection diagnosis, respectively. CONCLUSIONS: Omics-CNN, overcame the inherent problems of applying Convolutional Neural Networks for the training diagnostic models with quantitative omics data, outperforming previous models of machine learning developed using the same datasets for Ischemic Stroke and Covid-19 infection diagnosis, determining the most contributing biomarkers for both diseases.
format Online
Article
Text
id pubmed-10658203
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-106582032023-10-28 Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks Zompola, Anastasia Korfiati, Aigli Theofilatos, Konstantinos Mavroudi, Seferina Heliyon Research Article BACKGROUND AND OBJECTIVE: The development of machine learning-based models that can be used for the prediction of severe diseases has been one of the main concerns of the scientific community. The current study seeks to expand a highly sophisticated tool, the Convolutional Neural Networks, making it applicable in multidimensional omics data classification problems and testing the newly introduced method on publicly available transcriptomics and proteomics data. METHODS: In this study, we introduce Omics-CNN, a Convolutional Neural Network-based pipeline, which couples Convolutional Neural Networks with dimensionality reduction, preprocessing, clustering, and explainability techniques to make them suitable to build highly accurate and interpretable classification models from high-throughput omics data. The developed tool has the potential to classify patients depending on the expression of genetic and clinical factors and identify features that can act as diagnostic biomarkers. Regarding dimensionality reduction, univariate and multivariate techniques were explored and compared. Gradient Weighted Class Activation Mapping analysis was performed to determine the most important features in the classification of the samples after training the model. RESULTS: The newly introduced pipeline was applied to one transcriptomics and one proteomics dataset for the identification of diagnostic models and biosignatures for Ischemic Stroke (IS) and COVID-19 infection, reporting highly accurate biosignatures with accuracies of 96 % and 95.41 %, respectively. Meanwhile, classification models based solely on a small part of attributes provided lower predictive accuracy, but identified compact transcript biosignature (KRT15, VPRBP, TNFRSF4, GORASP2) for Ischemic Stroke and protein biosignature (ADGRB3, VNN2, AGER, CIAPIN1) for Covid-19 infection diagnosis, respectively. CONCLUSIONS: Omics-CNN, overcame the inherent problems of applying Convolutional Neural Networks for the training diagnostic models with quantitative omics data, outperforming previous models of machine learning developed using the same datasets for Ischemic Stroke and Covid-19 infection diagnosis, determining the most contributing biomarkers for both diseases. Elsevier 2023-10-28 /pmc/articles/PMC10658203/ /pubmed/38027840 http://dx.doi.org/10.1016/j.heliyon.2023.e21165 Text en © 2023 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Zompola, Anastasia
Korfiati, Aigli
Theofilatos, Konstantinos
Mavroudi, Seferina
Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks
title Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks
title_full Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks
title_fullStr Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks
title_full_unstemmed Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks
title_short Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks
title_sort omics-cnn: a comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10658203/
https://www.ncbi.nlm.nih.gov/pubmed/38027840
http://dx.doi.org/10.1016/j.heliyon.2023.e21165
work_keys_str_mv AT zompolaanastasia omicscnnacomprehensivepipelineforpredictiveanalyticsinquantitativeomicsusingonedimensionalconvolutionalneuralnetworks
AT korfiatiaigli omicscnnacomprehensivepipelineforpredictiveanalyticsinquantitativeomicsusingonedimensionalconvolutionalneuralnetworks
AT theofilatoskonstantinos omicscnnacomprehensivepipelineforpredictiveanalyticsinquantitativeomicsusingonedimensionalconvolutionalneuralnetworks
AT mavroudiseferina omicscnnacomprehensivepipelineforpredictiveanalyticsinquantitativeomicsusingonedimensionalconvolutionalneuralnetworks