Cargando…
Deciphering the Subtype Differentiation History of SARS-CoV-2 Based on a New Breadth-First Searching Optimized Alignment Method Over a Global Data Set of 24,768 Sequences
SARS-CoV-2 has caused a worldwide pandemic. Existing research on coronavirus mutations is based on small data sets, and multiple sequence alignment using a global-scale data set has yet to be conducted. Statistical analysis of integral mutations and global spread are necessary and could help improve...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7831388/ https://www.ncbi.nlm.nih.gov/pubmed/33505425 http://dx.doi.org/10.3389/fgene.2020.591833 |
_version_ | 1783641619458686976 |
---|---|
author | Lin, Qianyu Huang, Yunchuanxiang Jiang, Ziyi Wu, Feng Ma, Lan |
author_facet | Lin, Qianyu Huang, Yunchuanxiang Jiang, Ziyi Wu, Feng Ma, Lan |
author_sort | Lin, Qianyu |
collection | PubMed |
description | SARS-CoV-2 has caused a worldwide pandemic. Existing research on coronavirus mutations is based on small data sets, and multiple sequence alignment using a global-scale data set has yet to be conducted. Statistical analysis of integral mutations and global spread are necessary and could help improve primer design for nucleic acid diagnosis and vaccine development. Here, we optimized multiple sequence alignment using a conserved sequence search algorithm to align 24,768 sequences from the GISAID data set. A phylogenetic tree was constructed using the maximum likelihood (ML) method. Coronavirus subtypes were analyzed via t-SNE clustering. We performed haplotype network analysis and t-SNE clustering to analyze the coronavirus origin and spread. Overall, we identified 33 sense, 17 nonsense, 79 amino acid loss, and 4 amino acid insertion mutations in full-length open reading frames. Phylogenetic trees were successfully constructed and samples clustered into subtypes. The COVID-19 pandemic differed among countries and continents. Samples from the United States and western Europe were more diverse, and those from China and Asia mainly contained specific subtypes. Clades G/GH/GR are more likely to be the origin clades of SARS-CoV-2 compared with clades S/L/V. Conserved sequence searches can be used to segment long sequences, making large-scale multisequence alignment possible, facilitating more comprehensive gene mutation analysis. Mutation analysis of the SARS-CoV-2 can inform primer design for nucleic acid diagnosis to improve virus detection efficiency. In addition, research into the characteristics of viral spread and relationships among geographic regions can help formulate health policies and reduce the increase of imported cases. |
format | Online Article Text |
id | pubmed-7831388 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-78313882021-01-26 Deciphering the Subtype Differentiation History of SARS-CoV-2 Based on a New Breadth-First Searching Optimized Alignment Method Over a Global Data Set of 24,768 Sequences Lin, Qianyu Huang, Yunchuanxiang Jiang, Ziyi Wu, Feng Ma, Lan Front Genet Genetics SARS-CoV-2 has caused a worldwide pandemic. Existing research on coronavirus mutations is based on small data sets, and multiple sequence alignment using a global-scale data set has yet to be conducted. Statistical analysis of integral mutations and global spread are necessary and could help improve primer design for nucleic acid diagnosis and vaccine development. Here, we optimized multiple sequence alignment using a conserved sequence search algorithm to align 24,768 sequences from the GISAID data set. A phylogenetic tree was constructed using the maximum likelihood (ML) method. Coronavirus subtypes were analyzed via t-SNE clustering. We performed haplotype network analysis and t-SNE clustering to analyze the coronavirus origin and spread. Overall, we identified 33 sense, 17 nonsense, 79 amino acid loss, and 4 amino acid insertion mutations in full-length open reading frames. Phylogenetic trees were successfully constructed and samples clustered into subtypes. The COVID-19 pandemic differed among countries and continents. Samples from the United States and western Europe were more diverse, and those from China and Asia mainly contained specific subtypes. Clades G/GH/GR are more likely to be the origin clades of SARS-CoV-2 compared with clades S/L/V. Conserved sequence searches can be used to segment long sequences, making large-scale multisequence alignment possible, facilitating more comprehensive gene mutation analysis. Mutation analysis of the SARS-CoV-2 can inform primer design for nucleic acid diagnosis to improve virus detection efficiency. In addition, research into the characteristics of viral spread and relationships among geographic regions can help formulate health policies and reduce the increase of imported cases. Frontiers Media S.A. 2021-01-11 /pmc/articles/PMC7831388/ /pubmed/33505425 http://dx.doi.org/10.3389/fgene.2020.591833 Text en Copyright © 2021 Lin, Huang, Jiang, Wu and Ma. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Lin, Qianyu Huang, Yunchuanxiang Jiang, Ziyi Wu, Feng Ma, Lan Deciphering the Subtype Differentiation History of SARS-CoV-2 Based on a New Breadth-First Searching Optimized Alignment Method Over a Global Data Set of 24,768 Sequences |
title | Deciphering the Subtype Differentiation History of SARS-CoV-2 Based on a New Breadth-First Searching Optimized Alignment Method Over a Global Data Set of 24,768 Sequences |
title_full | Deciphering the Subtype Differentiation History of SARS-CoV-2 Based on a New Breadth-First Searching Optimized Alignment Method Over a Global Data Set of 24,768 Sequences |
title_fullStr | Deciphering the Subtype Differentiation History of SARS-CoV-2 Based on a New Breadth-First Searching Optimized Alignment Method Over a Global Data Set of 24,768 Sequences |
title_full_unstemmed | Deciphering the Subtype Differentiation History of SARS-CoV-2 Based on a New Breadth-First Searching Optimized Alignment Method Over a Global Data Set of 24,768 Sequences |
title_short | Deciphering the Subtype Differentiation History of SARS-CoV-2 Based on a New Breadth-First Searching Optimized Alignment Method Over a Global Data Set of 24,768 Sequences |
title_sort | deciphering the subtype differentiation history of sars-cov-2 based on a new breadth-first searching optimized alignment method over a global data set of 24,768 sequences |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7831388/ https://www.ncbi.nlm.nih.gov/pubmed/33505425 http://dx.doi.org/10.3389/fgene.2020.591833 |
work_keys_str_mv | AT linqianyu decipheringthesubtypedifferentiationhistoryofsarscov2basedonanewbreadthfirstsearchingoptimizedalignmentmethodoveraglobaldatasetof24768sequences AT huangyunchuanxiang decipheringthesubtypedifferentiationhistoryofsarscov2basedonanewbreadthfirstsearchingoptimizedalignmentmethodoveraglobaldatasetof24768sequences AT jiangziyi decipheringthesubtypedifferentiationhistoryofsarscov2basedonanewbreadthfirstsearchingoptimizedalignmentmethodoveraglobaldatasetof24768sequences AT wufeng decipheringthesubtypedifferentiationhistoryofsarscov2basedonanewbreadthfirstsearchingoptimizedalignmentmethodoveraglobaldatasetof24768sequences AT malan decipheringthesubtypedifferentiationhistoryofsarscov2basedonanewbreadthfirstsearchingoptimizedalignmentmethodoveraglobaldatasetof24768sequences |