Cargando…

Simple Classification of RNA Sequences of Respiratory-Related Coronaviruses

[Image: see text] A very simple, fast, and efficient approach to analyze and identify respiratory-related virus sequences based on machine learning is proposed. Such schemes are very important in identifying viruses, especially in view of spreading pandemics. The method is based on genetic code rule...

Descripción completa

Detalles Bibliográficos
Autores principales: Oberer, Louis, Carral, Angel Diaz, Fyta, Maria
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2021
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8353891/
https://www.ncbi.nlm.nih.gov/pubmed/34395967
http://dx.doi.org/10.1021/acsomega.1c01625
_version_ 1783736494095073280
author Oberer, Louis
Carral, Angel Diaz
Fyta, Maria
author_facet Oberer, Louis
Carral, Angel Diaz
Fyta, Maria
author_sort Oberer, Louis
collection PubMed
description [Image: see text] A very simple, fast, and efficient approach to analyze and identify respiratory-related virus sequences based on machine learning is proposed. Such schemes are very important in identifying viruses, especially in view of spreading pandemics. The method is based on genetic code rules and the open reading frame (ORF). Data from the respiratory-related coronaviruses are collected and features are extracted based on reoccurring nucleobase 3-tuples in the RNA. Our methodology is simply based on counting nucleobase triplets, normalizing the count to the length of the sequence, and applying principal component analysis (PCA) techniques. The triplet counting can be further used for classification purposes. DNA sequences from the herpes virus family can be considered as the first step towards a complete and accurate classification including more complex factors, such as mutations. The proposed classification scheme is simply based on “counting” biological information. It can serve as the first fast detection method, widely accessible and portable to a variety of distinct architectures for fast and on-the-fly detection. We provide an approach that can be further optimized and combined with supervised techniques to allow for more accurate detection and read out of the exact virus type or sequence. We discuss the relevance of this scheme in identifying differences in similar viruses and their impact on biochemical analysis.
format Online
Article
Text
id pubmed-8353891
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-83538912021-08-10 Simple Classification of RNA Sequences of Respiratory-Related Coronaviruses Oberer, Louis Carral, Angel Diaz Fyta, Maria ACS Omega [Image: see text] A very simple, fast, and efficient approach to analyze and identify respiratory-related virus sequences based on machine learning is proposed. Such schemes are very important in identifying viruses, especially in view of spreading pandemics. The method is based on genetic code rules and the open reading frame (ORF). Data from the respiratory-related coronaviruses are collected and features are extracted based on reoccurring nucleobase 3-tuples in the RNA. Our methodology is simply based on counting nucleobase triplets, normalizing the count to the length of the sequence, and applying principal component analysis (PCA) techniques. The triplet counting can be further used for classification purposes. DNA sequences from the herpes virus family can be considered as the first step towards a complete and accurate classification including more complex factors, such as mutations. The proposed classification scheme is simply based on “counting” biological information. It can serve as the first fast detection method, widely accessible and portable to a variety of distinct architectures for fast and on-the-fly detection. We provide an approach that can be further optimized and combined with supervised techniques to allow for more accurate detection and read out of the exact virus type or sequence. We discuss the relevance of this scheme in identifying differences in similar viruses and their impact on biochemical analysis. American Chemical Society 2021-07-28 /pmc/articles/PMC8353891/ /pubmed/34395967 http://dx.doi.org/10.1021/acsomega.1c01625 Text en © 2021 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Oberer, Louis
Carral, Angel Diaz
Fyta, Maria
Simple Classification of RNA Sequences of Respiratory-Related Coronaviruses
title Simple Classification of RNA Sequences of Respiratory-Related Coronaviruses
title_full Simple Classification of RNA Sequences of Respiratory-Related Coronaviruses
title_fullStr Simple Classification of RNA Sequences of Respiratory-Related Coronaviruses
title_full_unstemmed Simple Classification of RNA Sequences of Respiratory-Related Coronaviruses
title_short Simple Classification of RNA Sequences of Respiratory-Related Coronaviruses
title_sort simple classification of rna sequences of respiratory-related coronaviruses
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8353891/
https://www.ncbi.nlm.nih.gov/pubmed/34395967
http://dx.doi.org/10.1021/acsomega.1c01625
work_keys_str_mv AT obererlouis simpleclassificationofrnasequencesofrespiratoryrelatedcoronaviruses
AT carralangeldiaz simpleclassificationofrnasequencesofrespiratoryrelatedcoronaviruses
AT fytamaria simpleclassificationofrnasequencesofrespiratoryrelatedcoronaviruses