Cargando…

Word2vec neural model-based technique to generate protein vectors for combating COVID-19: a machine learning approach

The world was ambushed in 2019 by the COVID-19 virus which affected the health, economy, and lifestyle of individuals worldwide. One way of combating such a public health concern is by using appropriate, rapid, and unbiased diagnostic tools for quick detection of infected people. However, a current...

Descripción completa

Detalles Bibliográficos
Autores principales: Adjuik, Toby A., Ananey-Obiri, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Nature Singapore 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9119569/
https://www.ncbi.nlm.nih.gov/pubmed/35611155
http://dx.doi.org/10.1007/s41870-022-00949-2
_version_ 1784710730323329024
author Adjuik, Toby A.
Ananey-Obiri, Daniel
author_facet Adjuik, Toby A.
Ananey-Obiri, Daniel
author_sort Adjuik, Toby A.
collection PubMed
description The world was ambushed in 2019 by the COVID-19 virus which affected the health, economy, and lifestyle of individuals worldwide. One way of combating such a public health concern is by using appropriate, rapid, and unbiased diagnostic tools for quick detection of infected people. However, a current dearth of bioinformatics tools necessitates modeling studies to help diagnose COVID-19 cases. Molecular-based methods such as the real-time reverse transcription polymerase chain reaction (rRT-PCR) for detecting COVID-19 is time consuming and prone to contamination. Modern bioinformatics tools have made it possible to create large databases of protein sequences of various diseases, apply data mining techniques, and accurately diagnose diseases. However, the current sequence alignment tools that use these databases are not able to detect novel COVID-19 viral sequences due to high sequence dissimilarity. The objective of this study, therefore, was to develop models that can accurately classify COVID-19 viral sequences rapidly using protein vectors generated by neural word embedding technique. Five machine learning models; K nearest neighbor regression (KNN), support vector machine (SVM), random forest (RF), Linear discriminant analysis (LDA), and Logistic regression were developed using datasets from the National Center for Biotechnology. Our results suggest, the RF model performed better than all other models on the training dataset with 99% accuracy score and 99.5% accuracy on the testing dataset. The implication of this study is that, rapid detection of the COVID-19 virus in suspected cases could potentially save lives as less time will be needed to ascertain the status of a patient.
format Online
Article
Text
id pubmed-9119569
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer Nature Singapore
record_format MEDLINE/PubMed
spelling pubmed-91195692022-05-20 Word2vec neural model-based technique to generate protein vectors for combating COVID-19: a machine learning approach Adjuik, Toby A. Ananey-Obiri, Daniel Int J Inf Technol Original Research The world was ambushed in 2019 by the COVID-19 virus which affected the health, economy, and lifestyle of individuals worldwide. One way of combating such a public health concern is by using appropriate, rapid, and unbiased diagnostic tools for quick detection of infected people. However, a current dearth of bioinformatics tools necessitates modeling studies to help diagnose COVID-19 cases. Molecular-based methods such as the real-time reverse transcription polymerase chain reaction (rRT-PCR) for detecting COVID-19 is time consuming and prone to contamination. Modern bioinformatics tools have made it possible to create large databases of protein sequences of various diseases, apply data mining techniques, and accurately diagnose diseases. However, the current sequence alignment tools that use these databases are not able to detect novel COVID-19 viral sequences due to high sequence dissimilarity. The objective of this study, therefore, was to develop models that can accurately classify COVID-19 viral sequences rapidly using protein vectors generated by neural word embedding technique. Five machine learning models; K nearest neighbor regression (KNN), support vector machine (SVM), random forest (RF), Linear discriminant analysis (LDA), and Logistic regression were developed using datasets from the National Center for Biotechnology. Our results suggest, the RF model performed better than all other models on the training dataset with 99% accuracy score and 99.5% accuracy on the testing dataset. The implication of this study is that, rapid detection of the COVID-19 virus in suspected cases could potentially save lives as less time will be needed to ascertain the status of a patient. Springer Nature Singapore 2022-05-19 2022 /pmc/articles/PMC9119569/ /pubmed/35611155 http://dx.doi.org/10.1007/s41870-022-00949-2 Text en © The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Research
Adjuik, Toby A.
Ananey-Obiri, Daniel
Word2vec neural model-based technique to generate protein vectors for combating COVID-19: a machine learning approach
title Word2vec neural model-based technique to generate protein vectors for combating COVID-19: a machine learning approach
title_full Word2vec neural model-based technique to generate protein vectors for combating COVID-19: a machine learning approach
title_fullStr Word2vec neural model-based technique to generate protein vectors for combating COVID-19: a machine learning approach
title_full_unstemmed Word2vec neural model-based technique to generate protein vectors for combating COVID-19: a machine learning approach
title_short Word2vec neural model-based technique to generate protein vectors for combating COVID-19: a machine learning approach
title_sort word2vec neural model-based technique to generate protein vectors for combating covid-19: a machine learning approach
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9119569/
https://www.ncbi.nlm.nih.gov/pubmed/35611155
http://dx.doi.org/10.1007/s41870-022-00949-2
work_keys_str_mv AT adjuiktobya word2vecneuralmodelbasedtechniquetogenerateproteinvectorsforcombatingcovid19amachinelearningapproach
AT ananeyobiridaniel word2vecneuralmodelbasedtechniquetogenerateproteinvectorsforcombatingcovid19amachinelearningapproach