Cargando…
Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world
Identifying the population structure of the newly emerged coronavirus SARS-CoV-2 has significant potential to inform public health management and diagnosis. As SARS-CoV-2 sequencing data accrued, grouping them into clusters is important for organizing the landscape of the population structure of the...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8629198/ https://www.ncbi.nlm.nih.gov/pubmed/34845455 http://dx.doi.org/10.1101/2020.09.04.283358 |
_version_ | 1784607153967857664 |
---|---|
author | Li, Yawei Liu, Qingyun Zeng, Zexian Luo, Yuan |
author_facet | Li, Yawei Liu, Qingyun Zeng, Zexian Luo, Yuan |
author_sort | Li, Yawei |
collection | PubMed |
description | Identifying the population structure of the newly emerged coronavirus SARS-CoV-2 has significant potential to inform public health management and diagnosis. As SARS-CoV-2 sequencing data accrued, grouping them into clusters is important for organizing the landscape of the population structure of the virus. Due to the limited prior information on the newly emerged coronavirus, we utilized four different clustering algorithms to group 16,873 SARS-CoV-2 strains, which automatically enables the identification of spatial structure for SARS-CoV-2. A total of six distinct genomic clusters were identified using mutation profiles as input features. Comparison of the clustering results reveals that the four algorithms produced highly consistent results, but the state-of-the-art unsupervised deep learning clustering algorithm performed best and produced the smallest intra-cluster pairwise genetic distances. The varied proportions of the six clusters within different continents revealed specific geographical distributions. In particular, our analysis found that Oceania was the only continent on which the strains were dispersively distributed into six clusters. In summary, this study provides a concrete framework for the use of clustering methods to study the global population structure of SARS-CoV-2. In addition, clustering methods can be used for future studies of variant population structures in specific regions of these fast-growing viruses. |
format | Online Article Text |
id | pubmed-8629198 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-86291982021-11-30 Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world Li, Yawei Liu, Qingyun Zeng, Zexian Luo, Yuan bioRxiv Article Identifying the population structure of the newly emerged coronavirus SARS-CoV-2 has significant potential to inform public health management and diagnosis. As SARS-CoV-2 sequencing data accrued, grouping them into clusters is important for organizing the landscape of the population structure of the virus. Due to the limited prior information on the newly emerged coronavirus, we utilized four different clustering algorithms to group 16,873 SARS-CoV-2 strains, which automatically enables the identification of spatial structure for SARS-CoV-2. A total of six distinct genomic clusters were identified using mutation profiles as input features. Comparison of the clustering results reveals that the four algorithms produced highly consistent results, but the state-of-the-art unsupervised deep learning clustering algorithm performed best and produced the smallest intra-cluster pairwise genetic distances. The varied proportions of the six clusters within different continents revealed specific geographical distributions. In particular, our analysis found that Oceania was the only continent on which the strains were dispersively distributed into six clusters. In summary, this study provides a concrete framework for the use of clustering methods to study the global population structure of SARS-CoV-2. In addition, clustering methods can be used for future studies of variant population structures in specific regions of these fast-growing viruses. Cold Spring Harbor Laboratory 2021-11-24 /pmc/articles/PMC8629198/ /pubmed/34845455 http://dx.doi.org/10.1101/2020.09.04.283358 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Li, Yawei Liu, Qingyun Zeng, Zexian Luo, Yuan Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world |
title | Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world |
title_full | Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world |
title_fullStr | Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world |
title_full_unstemmed | Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world |
title_short | Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world |
title_sort | unsupervised clustering analysis of sars-cov-2 population structure reveals six major subtypes at early stage across the world |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8629198/ https://www.ncbi.nlm.nih.gov/pubmed/34845455 http://dx.doi.org/10.1101/2020.09.04.283358 |
work_keys_str_mv | AT liyawei unsupervisedclusteringanalysisofsarscov2populationstructurerevealssixmajorsubtypesatearlystageacrosstheworld AT liuqingyun unsupervisedclusteringanalysisofsarscov2populationstructurerevealssixmajorsubtypesatearlystageacrosstheworld AT zengzexian unsupervisedclusteringanalysisofsarscov2populationstructurerevealssixmajorsubtypesatearlystageacrosstheworld AT luoyuan unsupervisedclusteringanalysisofsarscov2populationstructurerevealssixmajorsubtypesatearlystageacrosstheworld |