Cargando…
Simple Classification of RNA Sequences of Respiratory-Related Coronaviruses
[Image: see text] A very simple, fast, and efficient approach to analyze and identify respiratory-related virus sequences based on machine learning is proposed. Such schemes are very important in identifying viruses, especially in view of spreading pandemics. The method is based on genetic code rule...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2021
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8353891/ https://www.ncbi.nlm.nih.gov/pubmed/34395967 http://dx.doi.org/10.1021/acsomega.1c01625 |
_version_ | 1783736494095073280 |
---|---|
author | Oberer, Louis Carral, Angel Diaz Fyta, Maria |
author_facet | Oberer, Louis Carral, Angel Diaz Fyta, Maria |
author_sort | Oberer, Louis |
collection | PubMed |
description | [Image: see text] A very simple, fast, and efficient approach to analyze and identify respiratory-related virus sequences based on machine learning is proposed. Such schemes are very important in identifying viruses, especially in view of spreading pandemics. The method is based on genetic code rules and the open reading frame (ORF). Data from the respiratory-related coronaviruses are collected and features are extracted based on reoccurring nucleobase 3-tuples in the RNA. Our methodology is simply based on counting nucleobase triplets, normalizing the count to the length of the sequence, and applying principal component analysis (PCA) techniques. The triplet counting can be further used for classification purposes. DNA sequences from the herpes virus family can be considered as the first step towards a complete and accurate classification including more complex factors, such as mutations. The proposed classification scheme is simply based on “counting” biological information. It can serve as the first fast detection method, widely accessible and portable to a variety of distinct architectures for fast and on-the-fly detection. We provide an approach that can be further optimized and combined with supervised techniques to allow for more accurate detection and read out of the exact virus type or sequence. We discuss the relevance of this scheme in identifying differences in similar viruses and their impact on biochemical analysis. |
format | Online Article Text |
id | pubmed-8353891 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-83538912021-08-10 Simple Classification of RNA Sequences of Respiratory-Related Coronaviruses Oberer, Louis Carral, Angel Diaz Fyta, Maria ACS Omega [Image: see text] A very simple, fast, and efficient approach to analyze and identify respiratory-related virus sequences based on machine learning is proposed. Such schemes are very important in identifying viruses, especially in view of spreading pandemics. The method is based on genetic code rules and the open reading frame (ORF). Data from the respiratory-related coronaviruses are collected and features are extracted based on reoccurring nucleobase 3-tuples in the RNA. Our methodology is simply based on counting nucleobase triplets, normalizing the count to the length of the sequence, and applying principal component analysis (PCA) techniques. The triplet counting can be further used for classification purposes. DNA sequences from the herpes virus family can be considered as the first step towards a complete and accurate classification including more complex factors, such as mutations. The proposed classification scheme is simply based on “counting” biological information. It can serve as the first fast detection method, widely accessible and portable to a variety of distinct architectures for fast and on-the-fly detection. We provide an approach that can be further optimized and combined with supervised techniques to allow for more accurate detection and read out of the exact virus type or sequence. We discuss the relevance of this scheme in identifying differences in similar viruses and their impact on biochemical analysis. American Chemical Society 2021-07-28 /pmc/articles/PMC8353891/ /pubmed/34395967 http://dx.doi.org/10.1021/acsomega.1c01625 Text en © 2021 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Oberer, Louis Carral, Angel Diaz Fyta, Maria Simple Classification of RNA Sequences of Respiratory-Related Coronaviruses |
title | Simple Classification of RNA Sequences of Respiratory-Related
Coronaviruses |
title_full | Simple Classification of RNA Sequences of Respiratory-Related
Coronaviruses |
title_fullStr | Simple Classification of RNA Sequences of Respiratory-Related
Coronaviruses |
title_full_unstemmed | Simple Classification of RNA Sequences of Respiratory-Related
Coronaviruses |
title_short | Simple Classification of RNA Sequences of Respiratory-Related
Coronaviruses |
title_sort | simple classification of rna sequences of respiratory-related
coronaviruses |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8353891/ https://www.ncbi.nlm.nih.gov/pubmed/34395967 http://dx.doi.org/10.1021/acsomega.1c01625 |
work_keys_str_mv | AT obererlouis simpleclassificationofrnasequencesofrespiratoryrelatedcoronaviruses AT carralangeldiaz simpleclassificationofrnasequencesofrespiratoryrelatedcoronaviruses AT fytamaria simpleclassificationofrnasequencesofrespiratoryrelatedcoronaviruses |