Cargando…

Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences

We present an approach to discriminate SARS-CoV-2 virus types based on their RNA sequence descriptions avoiding a sequence alignment. For that purpose, sequences are preprocessed by feature extraction and the resulting feature vectors are analyzed by prototype-based classification to remain interpre...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kaden, Marika, Bohnsack, Katrin Sophie, Weber, Mirko, Kudła, Mateusz, Gutowska, Kaja, Blazewicz, Jacek, Villmann, Thomas
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer London 2021
Materias:	S.i. : Wsom 2019
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8076884/ https://www.ncbi.nlm.nih.gov/pubmed/33935376 http://dx.doi.org/10.1007/s00521-021-06018-2

_version_	1783684779718213632
author	Kaden, Marika Bohnsack, Katrin Sophie Weber, Mirko Kudła, Mateusz Gutowska, Kaja Blazewicz, Jacek Villmann, Thomas
author_facet	Kaden, Marika Bohnsack, Katrin Sophie Weber, Mirko Kudła, Mateusz Gutowska, Kaja Blazewicz, Jacek Villmann, Thomas
author_sort	Kaden, Marika
collection	PubMed
description	We present an approach to discriminate SARS-CoV-2 virus types based on their RNA sequence descriptions avoiding a sequence alignment. For that purpose, sequences are preprocessed by feature extraction and the resulting feature vectors are analyzed by prototype-based classification to remain interpretable. In particular, we propose to use variants of learning vector quantization (LVQ) based on dissimilarity measures for RNA sequence data. The respective matrix LVQ provides additional knowledge about the classification decisions like discriminant feature correlations and, additionally, can be equipped with easy to realize reject options for uncertain data. Those options provide self-controlled evidence, i.e., the model refuses to make a classification decision if the model evidence for the presented data is not sufficient. This model is first trained using a GISAID dataset with given virus types detected according to the molecular differences in coronavirus populations by phylogenetic tree clustering. In a second step, we apply the trained model to another but unlabeled SARS-CoV-2 virus dataset. For these data, we can either assign a virus type to the sequences or reject atypical samples. Those rejected sequences allow to speculate about new virus types with respect to nucleotide base mutations in the viral sequences. Moreover, this rejection analysis improves model robustness. Last but not least, the presented approach has lower computational complexity compared to methods based on (multiple) sequence alignment. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00521-021-06018-2.
format	Online Article Text
id	pubmed-8076884
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Springer London
record_format	MEDLINE/PubMed
spelling	pubmed-80768842021-04-27 Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences Kaden, Marika Bohnsack, Katrin Sophie Weber, Mirko Kudła, Mateusz Gutowska, Kaja Blazewicz, Jacek Villmann, Thomas Neural Comput Appl S.i. : Wsom 2019 We present an approach to discriminate SARS-CoV-2 virus types based on their RNA sequence descriptions avoiding a sequence alignment. For that purpose, sequences are preprocessed by feature extraction and the resulting feature vectors are analyzed by prototype-based classification to remain interpretable. In particular, we propose to use variants of learning vector quantization (LVQ) based on dissimilarity measures for RNA sequence data. The respective matrix LVQ provides additional knowledge about the classification decisions like discriminant feature correlations and, additionally, can be equipped with easy to realize reject options for uncertain data. Those options provide self-controlled evidence, i.e., the model refuses to make a classification decision if the model evidence for the presented data is not sufficient. This model is first trained using a GISAID dataset with given virus types detected according to the molecular differences in coronavirus populations by phylogenetic tree clustering. In a second step, we apply the trained model to another but unlabeled SARS-CoV-2 virus dataset. For these data, we can either assign a virus type to the sequences or reject atypical samples. Those rejected sequences allow to speculate about new virus types with respect to nucleotide base mutations in the viral sequences. Moreover, this rejection analysis improves model robustness. Last but not least, the presented approach has lower computational complexity compared to methods based on (multiple) sequence alignment. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00521-021-06018-2. Springer London 2021-04-27 2022 /pmc/articles/PMC8076884/ /pubmed/33935376 http://dx.doi.org/10.1007/s00521-021-06018-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	S.i. : Wsom 2019 Kaden, Marika Bohnsack, Katrin Sophie Weber, Mirko Kudła, Mateusz Gutowska, Kaja Blazewicz, Jacek Villmann, Thomas Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences
title	Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences
title_full	Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences
title_fullStr	Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences
title_full_unstemmed	Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences
title_short	Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences
title_sort	learning vector quantization as an interpretable classifier for the detection of sars-cov-2 types based on their rna sequences
topic	S.i. : Wsom 2019
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8076884/ https://www.ncbi.nlm.nih.gov/pubmed/33935376 http://dx.doi.org/10.1007/s00521-021-06018-2
work_keys_str_mv	AT kadenmarika learningvectorquantizationasaninterpretableclassifierforthedetectionofsarscov2typesbasedontheirrnasequences AT bohnsackkatrinsophie learningvectorquantizationasaninterpretableclassifierforthedetectionofsarscov2typesbasedontheirrnasequences AT webermirko learningvectorquantizationasaninterpretableclassifierforthedetectionofsarscov2typesbasedontheirrnasequences AT kudłamateusz learningvectorquantizationasaninterpretableclassifierforthedetectionofsarscov2typesbasedontheirrnasequences AT gutowskakaja learningvectorquantizationasaninterpretableclassifierforthedetectionofsarscov2typesbasedontheirrnasequences AT blazewiczjacek learningvectorquantizationasaninterpretableclassifierforthedetectionofsarscov2typesbasedontheirrnasequences AT villmannthomas learningvectorquantizationasaninterpretableclassifierforthedetectionofsarscov2typesbasedontheirrnasequences

Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences

Ejemplares similares