Cargando…
Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences
SARS-CoV-2 and Influenza-A can present similar symptoms. Computer-aided diagnosis can help facilitate screening for the two conditions, and may be especially relevant and useful in the current COVID-19 pandemic because seasonal Influenza-A infection can still occur. We have developed a novel text-ba...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9777168/ https://www.ncbi.nlm.nih.gov/pubmed/36553188 http://dx.doi.org/10.3390/diagnostics12123181 |
_version_ | 1784856037491212288 |
---|---|
author | Erten, Mehmet Acharya, Madhav R. Kamath, Aditya P. Sampathila, Niranjana Bairy, G. Muralidhar Aydemir, Emrah Barua, Prabal Datta Baygin, Mehmet Tuncer, Ilknur Dogan, Sengul Tuncer, Turker |
author_facet | Erten, Mehmet Acharya, Madhav R. Kamath, Aditya P. Sampathila, Niranjana Bairy, G. Muralidhar Aydemir, Emrah Barua, Prabal Datta Baygin, Mehmet Tuncer, Ilknur Dogan, Sengul Tuncer, Turker |
author_sort | Erten, Mehmet |
collection | PubMed |
description | SARS-CoV-2 and Influenza-A can present similar symptoms. Computer-aided diagnosis can help facilitate screening for the two conditions, and may be especially relevant and useful in the current COVID-19 pandemic because seasonal Influenza-A infection can still occur. We have developed a novel text-based classification model for discriminating between the two conditions using protein sequences of varying lengths. We downloaded viral protein sequences of SARS-CoV-2 and Influenza-A with varying lengths (all 100 or greater) from the NCBI database and randomly selected 16,901 SARS-CoV-2 and 19,523 Influenza-A sequences to form a two-class study dataset. We used a new feature extraction function based on a unique pattern, HamletPat, generated from the text of Shakespeare’s Hamlet, and a signum function to extract local binary pattern-like bits from overlapping fixed-length (27) blocks of the protein sequences. The bits were converted to decimal map signals from which histograms were extracted and concatenated to form a final feature vector of length 1280. The iterative Chi-square function selected the 340 most discriminative features to feed to an SVM with a Gaussian kernel for classification. The model attained 99.92% and 99.87% classification accuracy rates using hold-out (75:25 split ratio) and five-fold cross-validations, respectively. The excellent performance of the lightweight, handcrafted HamletPat-based classification model suggests that it can be a valuable tool for screening protein sequences to discriminate between SARS-CoV-2 and Influenza-A infections. |
format | Online Article Text |
id | pubmed-9777168 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-97771682022-12-23 Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences Erten, Mehmet Acharya, Madhav R. Kamath, Aditya P. Sampathila, Niranjana Bairy, G. Muralidhar Aydemir, Emrah Barua, Prabal Datta Baygin, Mehmet Tuncer, Ilknur Dogan, Sengul Tuncer, Turker Diagnostics (Basel) Article SARS-CoV-2 and Influenza-A can present similar symptoms. Computer-aided diagnosis can help facilitate screening for the two conditions, and may be especially relevant and useful in the current COVID-19 pandemic because seasonal Influenza-A infection can still occur. We have developed a novel text-based classification model for discriminating between the two conditions using protein sequences of varying lengths. We downloaded viral protein sequences of SARS-CoV-2 and Influenza-A with varying lengths (all 100 or greater) from the NCBI database and randomly selected 16,901 SARS-CoV-2 and 19,523 Influenza-A sequences to form a two-class study dataset. We used a new feature extraction function based on a unique pattern, HamletPat, generated from the text of Shakespeare’s Hamlet, and a signum function to extract local binary pattern-like bits from overlapping fixed-length (27) blocks of the protein sequences. The bits were converted to decimal map signals from which histograms were extracted and concatenated to form a final feature vector of length 1280. The iterative Chi-square function selected the 340 most discriminative features to feed to an SVM with a Gaussian kernel for classification. The model attained 99.92% and 99.87% classification accuracy rates using hold-out (75:25 split ratio) and five-fold cross-validations, respectively. The excellent performance of the lightweight, handcrafted HamletPat-based classification model suggests that it can be a valuable tool for screening protein sequences to discriminate between SARS-CoV-2 and Influenza-A infections. MDPI 2022-12-15 /pmc/articles/PMC9777168/ /pubmed/36553188 http://dx.doi.org/10.3390/diagnostics12123181 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Erten, Mehmet Acharya, Madhav R. Kamath, Aditya P. Sampathila, Niranjana Bairy, G. Muralidhar Aydemir, Emrah Barua, Prabal Datta Baygin, Mehmet Tuncer, Ilknur Dogan, Sengul Tuncer, Turker Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences |
title | Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences |
title_full | Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences |
title_fullStr | Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences |
title_full_unstemmed | Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences |
title_short | Hamlet-Pattern-Based Automated COVID-19 and Influenza Detection Model Using Protein Sequences |
title_sort | hamlet-pattern-based automated covid-19 and influenza detection model using protein sequences |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9777168/ https://www.ncbi.nlm.nih.gov/pubmed/36553188 http://dx.doi.org/10.3390/diagnostics12123181 |
work_keys_str_mv | AT ertenmehmet hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences AT acharyamadhavr hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences AT kamathadityap hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences AT sampathilaniranjana hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences AT bairygmuralidhar hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences AT aydemiremrah hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences AT baruaprabaldatta hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences AT bayginmehmet hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences AT tuncerilknur hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences AT dogansengul hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences AT tuncerturker hamletpatternbasedautomatedcovid19andinfluenzadetectionmodelusingproteinsequences |