Cargando…

Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences

SARS-CoV-2 and Influenza-A can present similar symptoms. Computer-aided diagnosis can help facilitate screening for the two conditions, and may be especially relevant and useful in the current COVID-19 pandemic because seasonal Influenza-A infection can still occur. We have developed a novel text-ba...

Descripción completa

Detalles Bibliográficos
Autores principales: Erten, Mehmet, Acharya, Madhav R., Kamath, Aditya P., Sampathila, Niranjana, Bairy, G. Muralidhar, Aydemir, Emrah, Barua, Prabal Datta, Baygin, Mehmet, Tuncer, Ilknur, Dogan, Sengul, Tuncer, Turker
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9777168/
https://www.ncbi.nlm.nih.gov/pubmed/36553188
http://dx.doi.org/10.3390/diagnostics12123181
_version_ 1784856037491212288
author Erten, Mehmet
Acharya, Madhav R.
Kamath, Aditya P.
Sampathila, Niranjana
Bairy, G. Muralidhar
Aydemir, Emrah
Barua, Prabal Datta
Baygin, Mehmet
Tuncer, Ilknur
Dogan, Sengul
Tuncer, Turker
author_facet Erten, Mehmet
Acharya, Madhav R.
Kamath, Aditya P.
Sampathila, Niranjana
Bairy, G. Muralidhar
Aydemir, Emrah
Barua, Prabal Datta
Baygin, Mehmet
Tuncer, Ilknur
Dogan, Sengul
Tuncer, Turker
author_sort Erten, Mehmet
collection PubMed
description SARS-CoV-2 and Influenza-A can present similar symptoms. Computer-aided diagnosis can help facilitate screening for the two conditions, and may be especially relevant and useful in the current COVID-19 pandemic because seasonal Influenza-A infection can still occur. We have developed a novel text-based classification model for discriminating between the two conditions using protein sequences of varying lengths. We downloaded viral protein sequences of SARS-CoV-2 and Influenza-A with varying lengths (all 100 or greater) from the NCBI database and randomly selected 16,901 SARS-CoV-2 and 19,523 Influenza-A sequences to form a two-class study dataset. We used a new feature extraction function based on a unique pattern, HamletPat, generated from the text of Shakespeare’s Hamlet, and a signum function to extract local binary pattern-like bits from overlapping fixed-length (27) blocks of the protein sequences. The bits were converted to decimal map signals from which histograms were extracted and concatenated to form a final feature vector of length 1280. The iterative Chi-square function selected the 340 most discriminative features to feed to an SVM with a Gaussian kernel for classification. The model attained 99.92% and 99.87% classification accuracy rates using hold-out (75:25 split ratio) and five-fold cross-validations, respectively. The excellent performance of the lightweight, handcrafted HamletPat-based classification model suggests that it can be a valuable tool for screening protein sequences to discriminate between SARS-CoV-2 and Influenza-A infections.
format Online
Article
Text
id pubmed-9777168
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-97771682022-12-23 Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences Erten, Mehmet Acharya, Madhav R. Kamath, Aditya P. Sampathila, Niranjana Bairy, G. Muralidhar Aydemir, Emrah Barua, Prabal Datta Baygin, Mehmet Tuncer, Ilknur Dogan, Sengul Tuncer, Turker Diagnostics (Basel) Article SARS-CoV-2 and Influenza-A can present similar symptoms. Computer-aided diagnosis can help facilitate screening for the two conditions, and may be especially relevant and useful in the current COVID-19 pandemic because seasonal Influenza-A infection can still occur. We have developed a novel text-based classification model for discriminating between the two conditions using protein sequences of varying lengths. We downloaded viral protein sequences of SARS-CoV-2 and Influenza-A with varying lengths (all 100 or greater) from the NCBI database and randomly selected 16,901 SARS-CoV-2 and 19,523 Influenza-A sequences to form a two-class study dataset. We used a new feature extraction function based on a unique pattern, HamletPat, generated from the text of Shakespeare’s Hamlet, and a signum function to extract local binary pattern-like bits from overlapping fixed-length (27) blocks of the protein sequences. The bits were converted to decimal map signals from which histograms were extracted and concatenated to form a final feature vector of length 1280. The iterative Chi-square function selected the 340 most discriminative features to feed to an SVM with a Gaussian kernel for classification. The model attained 99.92% and 99.87% classification accuracy rates using hold-out (75:25 split ratio) and five-fold cross-validations, respectively. The excellent performance of the lightweight, handcrafted HamletPat-based classification model suggests that it can be a valuable tool for screening protein sequences to discriminate between SARS-CoV-2 and Influenza-A infections. MDPI 2022-12-15 /pmc/articles/PMC9777168/ /pubmed/36553188 http://dx.doi.org/10.3390/diagnostics12123181 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Erten, Mehmet
Acharya, Madhav R.
Kamath, Aditya P.
Sampathila, Niranjana
Bairy, G. Muralidhar
Aydemir, Emrah
Barua, Prabal Datta
Baygin, Mehmet
Tuncer, Ilknur
Dogan, Sengul
Tuncer, Turker
Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences
title Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences
title_full Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences
title_fullStr Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences
title_full_unstemmed Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences
title_short Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences
title_sort hamlet-pattern-based automated covid-19 and influenza detection model using protein sequences
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9777168/
https://www.ncbi.nlm.nih.gov/pubmed/36553188
http://dx.doi.org/10.3390/diagnostics12123181
work_keys_str_mv AT ertenmehmet hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences
AT acharyamadhavr hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences
AT kamathadityap hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences
AT sampathilaniranjana hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences
AT bairygmuralidhar hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences
AT aydemiremrah hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences
AT baruaprabaldatta hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences
AT bayginmehmet hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences
AT tuncerilknur hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences
AT dogansengul hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences
AT tuncerturker hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences