Cargando…

A comprehensive tool for rapid and accurate prediction of disease using DNA sequence classifier

In the current pandemic situation where the coronavirus is spreading very fast that can jump from one human to another. Along with this, there are millions of viruses for example Ebola, SARS, etc. that can spread as fast as the coronavirus due to the mobilization and globalization of the population...

Descripción completa

Detalles Bibliográficos
Autores principales: Mathur, Garima, Pandey, Anjana, Goyal, Sachin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9243743/
https://www.ncbi.nlm.nih.gov/pubmed/35789598
http://dx.doi.org/10.1007/s12652-022-04099-y
_version_ 1784738380347604992
author Mathur, Garima
Pandey, Anjana
Goyal, Sachin
author_facet Mathur, Garima
Pandey, Anjana
Goyal, Sachin
author_sort Mathur, Garima
collection PubMed
description In the current pandemic situation where the coronavirus is spreading very fast that can jump from one human to another. Along with this, there are millions of viruses for example Ebola, SARS, etc. that can spread as fast as the coronavirus due to the mobilization and globalization of the population and are equally deadly. Earlier identification of these viruses can prevent the outbreaks that we are facing currently as well as can help in the earlier designing of drugs. Identification of disease at a prior stage can be achieved through DNA sequence classification as DNA carries most of the genetic information about organisms. This is the reason why the classification of DNA sequences plays an important role in computational biology. This paper has presented a solution in which samples collected from NCBI are used for the classification of DNA sequences. DNA sequence classification will in turn gives the pattern of various diseases; these patterns are then compared with the samples of a newly infected person and can help in the earlier identification of disease. However, feature extraction always remains a big issue. In this paper, a machine learning-based classifier and a new technique for extracting features from DNA sequences based on a hot vector matrix have been proposed. In the hot vector representation of the DNA sequence, each pair of the word is represented using a binary matrix which represents the position of each nucleotide in the DNA sequence. The resultant matrix is then given as an input to the traditional CNN for feature extraction. The results of the proposed method have been compared with 5 well-known classifiers namely Convolution neural network (CNN), Support Vector Machines (SVM), K-Nearest Neighbor (KNN) algorithm, Decision Trees, Recurrent Neural Networks (RNN) on several parameters including precision rate and accuracy and the result shows that the proposed method gives an accuracy of 93.9%, which is highest compared to other classifiers.
format Online
Article
Text
id pubmed-9243743
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-92437432022-06-30 A comprehensive tool for rapid and accurate prediction of disease using DNA sequence classifier Mathur, Garima Pandey, Anjana Goyal, Sachin J Ambient Intell Humaniz Comput Original Research In the current pandemic situation where the coronavirus is spreading very fast that can jump from one human to another. Along with this, there are millions of viruses for example Ebola, SARS, etc. that can spread as fast as the coronavirus due to the mobilization and globalization of the population and are equally deadly. Earlier identification of these viruses can prevent the outbreaks that we are facing currently as well as can help in the earlier designing of drugs. Identification of disease at a prior stage can be achieved through DNA sequence classification as DNA carries most of the genetic information about organisms. This is the reason why the classification of DNA sequences plays an important role in computational biology. This paper has presented a solution in which samples collected from NCBI are used for the classification of DNA sequences. DNA sequence classification will in turn gives the pattern of various diseases; these patterns are then compared with the samples of a newly infected person and can help in the earlier identification of disease. However, feature extraction always remains a big issue. In this paper, a machine learning-based classifier and a new technique for extracting features from DNA sequences based on a hot vector matrix have been proposed. In the hot vector representation of the DNA sequence, each pair of the word is represented using a binary matrix which represents the position of each nucleotide in the DNA sequence. The resultant matrix is then given as an input to the traditional CNN for feature extraction. The results of the proposed method have been compared with 5 well-known classifiers namely Convolution neural network (CNN), Support Vector Machines (SVM), K-Nearest Neighbor (KNN) algorithm, Decision Trees, Recurrent Neural Networks (RNN) on several parameters including precision rate and accuracy and the result shows that the proposed method gives an accuracy of 93.9%, which is highest compared to other classifiers. Springer Berlin Heidelberg 2022-06-25 /pmc/articles/PMC9243743/ /pubmed/35789598 http://dx.doi.org/10.1007/s12652-022-04099-y Text en © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Research
Mathur, Garima
Pandey, Anjana
Goyal, Sachin
A comprehensive tool for rapid and accurate prediction of disease using DNA sequence classifier
title A comprehensive tool for rapid and accurate prediction of disease using DNA sequence classifier
title_full A comprehensive tool for rapid and accurate prediction of disease using DNA sequence classifier
title_fullStr A comprehensive tool for rapid and accurate prediction of disease using DNA sequence classifier
title_full_unstemmed A comprehensive tool for rapid and accurate prediction of disease using DNA sequence classifier
title_short A comprehensive tool for rapid and accurate prediction of disease using DNA sequence classifier
title_sort comprehensive tool for rapid and accurate prediction of disease using dna sequence classifier
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9243743/
https://www.ncbi.nlm.nih.gov/pubmed/35789598
http://dx.doi.org/10.1007/s12652-022-04099-y
work_keys_str_mv AT mathurgarima acomprehensivetoolforrapidandaccuratepredictionofdiseaseusingdnasequenceclassifier
AT pandeyanjana acomprehensivetoolforrapidandaccuratepredictionofdiseaseusingdnasequenceclassifier
AT goyalsachin acomprehensivetoolforrapidandaccuratepredictionofdiseaseusingdnasequenceclassifier
AT mathurgarima comprehensivetoolforrapidandaccuratepredictionofdiseaseusingdnasequenceclassifier
AT pandeyanjana comprehensivetoolforrapidandaccuratepredictionofdiseaseusingdnasequenceclassifier
AT goyalsachin comprehensivetoolforrapidandaccuratepredictionofdiseaseusingdnasequenceclassifier