Cargando…

Using an Unsupervised Clustering Model to Detect the Early Spread of SARS-CoV-2 Worldwide

Deciphering the population structure of SARS-CoV-2 is critical to inform public health management and reduce the risk of future dissemination. With the continuous accruing of SARS-CoV-2 genomes worldwide, discovering an effective way to group these genomes is critical for organizing the landscape of...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yawei, Liu, Qingyun, Zeng, Zexian, Luo, Yuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9030792/
https://www.ncbi.nlm.nih.gov/pubmed/35456454
http://dx.doi.org/10.3390/genes13040648
_version_ 1784692229051252736
author Li, Yawei
Liu, Qingyun
Zeng, Zexian
Luo, Yuan
author_facet Li, Yawei
Liu, Qingyun
Zeng, Zexian
Luo, Yuan
author_sort Li, Yawei
collection PubMed
description Deciphering the population structure of SARS-CoV-2 is critical to inform public health management and reduce the risk of future dissemination. With the continuous accruing of SARS-CoV-2 genomes worldwide, discovering an effective way to group these genomes is critical for organizing the landscape of the population structure of the virus. Taking advantage of recently published state-of-the-art machine learning algorithms, we used an unsupervised deep learning clustering algorithm to group a total of 16,873 SARS-CoV-2 genomes. Using single nucleotide polymorphisms as input features, we identified six major subtypes of SARS-CoV-2. The proportions of the clusters across the continents revealed distinct geographical distributions. Comprehensive analysis indicated that both genetic factors and human migration factors shaped the specific geographical distribution of the population structure. This study provides a different approach using clustering methods to study the population structure of a never-seen-before and fast-growing species such as SARS-CoV-2. Moreover, clustering techniques can be used for further studies of local population structures of the proliferating virus.
format Online
Article
Text
id pubmed-9030792
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-90307922022-04-23 Using an Unsupervised Clustering Model to Detect the Early Spread of SARS-CoV-2 Worldwide Li, Yawei Liu, Qingyun Zeng, Zexian Luo, Yuan Genes (Basel) Article Deciphering the population structure of SARS-CoV-2 is critical to inform public health management and reduce the risk of future dissemination. With the continuous accruing of SARS-CoV-2 genomes worldwide, discovering an effective way to group these genomes is critical for organizing the landscape of the population structure of the virus. Taking advantage of recently published state-of-the-art machine learning algorithms, we used an unsupervised deep learning clustering algorithm to group a total of 16,873 SARS-CoV-2 genomes. Using single nucleotide polymorphisms as input features, we identified six major subtypes of SARS-CoV-2. The proportions of the clusters across the continents revealed distinct geographical distributions. Comprehensive analysis indicated that both genetic factors and human migration factors shaped the specific geographical distribution of the population structure. This study provides a different approach using clustering methods to study the population structure of a never-seen-before and fast-growing species such as SARS-CoV-2. Moreover, clustering techniques can be used for further studies of local population structures of the proliferating virus. MDPI 2022-04-07 /pmc/articles/PMC9030792/ /pubmed/35456454 http://dx.doi.org/10.3390/genes13040648 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Li, Yawei
Liu, Qingyun
Zeng, Zexian
Luo, Yuan
Using an Unsupervised Clustering Model to Detect the Early Spread of SARS-CoV-2 Worldwide
title Using an Unsupervised Clustering Model to Detect the Early Spread of SARS-CoV-2 Worldwide
title_full Using an Unsupervised Clustering Model to Detect the Early Spread of SARS-CoV-2 Worldwide
title_fullStr Using an Unsupervised Clustering Model to Detect the Early Spread of SARS-CoV-2 Worldwide
title_full_unstemmed Using an Unsupervised Clustering Model to Detect the Early Spread of SARS-CoV-2 Worldwide
title_short Using an Unsupervised Clustering Model to Detect the Early Spread of SARS-CoV-2 Worldwide
title_sort using an unsupervised clustering model to detect the early spread of sars-cov-2 worldwide
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9030792/
https://www.ncbi.nlm.nih.gov/pubmed/35456454
http://dx.doi.org/10.3390/genes13040648
work_keys_str_mv AT liyawei usinganunsupervisedclusteringmodeltodetecttheearlyspreadofsarscov2worldwide
AT liuqingyun usinganunsupervisedclusteringmodeltodetecttheearlyspreadofsarscov2worldwide
AT zengzexian usinganunsupervisedclusteringmodeltodetecttheearlyspreadofsarscov2worldwide
AT luoyuan usinganunsupervisedclusteringmodeltodetecttheearlyspreadofsarscov2worldwide