Cargando…

Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world

Identifying the population structure of the newly emerged coronavirus SARS-CoV-2 has significant potential to inform public health management and diagnosis. As SARS-CoV-2 sequencing data accrued, grouping them into clusters is important for organizing the landscape of the population structure of the...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yawei, Liu, Qingyun, Zeng, Zexian, Luo, Yuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8629198/
https://www.ncbi.nlm.nih.gov/pubmed/34845455
http://dx.doi.org/10.1101/2020.09.04.283358
_version_ 1784607153967857664
author Li, Yawei
Liu, Qingyun
Zeng, Zexian
Luo, Yuan
author_facet Li, Yawei
Liu, Qingyun
Zeng, Zexian
Luo, Yuan
author_sort Li, Yawei
collection PubMed
description Identifying the population structure of the newly emerged coronavirus SARS-CoV-2 has significant potential to inform public health management and diagnosis. As SARS-CoV-2 sequencing data accrued, grouping them into clusters is important for organizing the landscape of the population structure of the virus. Due to the limited prior information on the newly emerged coronavirus, we utilized four different clustering algorithms to group 16,873 SARS-CoV-2 strains, which automatically enables the identification of spatial structure for SARS-CoV-2. A total of six distinct genomic clusters were identified using mutation profiles as input features. Comparison of the clustering results reveals that the four algorithms produced highly consistent results, but the state-of-the-art unsupervised deep learning clustering algorithm performed best and produced the smallest intra-cluster pairwise genetic distances. The varied proportions of the six clusters within different continents revealed specific geographical distributions. In particular, our analysis found that Oceania was the only continent on which the strains were dispersively distributed into six clusters. In summary, this study provides a concrete framework for the use of clustering methods to study the global population structure of SARS-CoV-2. In addition, clustering methods can be used for future studies of variant population structures in specific regions of these fast-growing viruses.
format Online
Article
Text
id pubmed-8629198
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-86291982021-11-30 Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world Li, Yawei Liu, Qingyun Zeng, Zexian Luo, Yuan bioRxiv Article Identifying the population structure of the newly emerged coronavirus SARS-CoV-2 has significant potential to inform public health management and diagnosis. As SARS-CoV-2 sequencing data accrued, grouping them into clusters is important for organizing the landscape of the population structure of the virus. Due to the limited prior information on the newly emerged coronavirus, we utilized four different clustering algorithms to group 16,873 SARS-CoV-2 strains, which automatically enables the identification of spatial structure for SARS-CoV-2. A total of six distinct genomic clusters were identified using mutation profiles as input features. Comparison of the clustering results reveals that the four algorithms produced highly consistent results, but the state-of-the-art unsupervised deep learning clustering algorithm performed best and produced the smallest intra-cluster pairwise genetic distances. The varied proportions of the six clusters within different continents revealed specific geographical distributions. In particular, our analysis found that Oceania was the only continent on which the strains were dispersively distributed into six clusters. In summary, this study provides a concrete framework for the use of clustering methods to study the global population structure of SARS-CoV-2. In addition, clustering methods can be used for future studies of variant population structures in specific regions of these fast-growing viruses. Cold Spring Harbor Laboratory 2021-11-24 /pmc/articles/PMC8629198/ /pubmed/34845455 http://dx.doi.org/10.1101/2020.09.04.283358 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Li, Yawei
Liu, Qingyun
Zeng, Zexian
Luo, Yuan
Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world
title Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world
title_full Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world
title_fullStr Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world
title_full_unstemmed Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world
title_short Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world
title_sort unsupervised clustering analysis of sars-cov-2 population structure reveals six major subtypes at early stage across the world
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8629198/
https://www.ncbi.nlm.nih.gov/pubmed/34845455
http://dx.doi.org/10.1101/2020.09.04.283358
work_keys_str_mv AT liyawei unsupervisedclusteringanalysisofsarscov2populationstructurerevealssixmajorsubtypesatearlystageacrosstheworld
AT liuqingyun unsupervisedclusteringanalysisofsarscov2populationstructurerevealssixmajorsubtypesatearlystageacrosstheworld
AT zengzexian unsupervisedclusteringanalysisofsarscov2populationstructurerevealssixmajorsubtypesatearlystageacrosstheworld
AT luoyuan unsupervisedclusteringanalysisofsarscov2populationstructurerevealssixmajorsubtypesatearlystageacrosstheworld