Cargando…

Genomic Analysis of the Suspicious SARS-CoV-2 Sequences in the Public Sequencing Database

SARS-CoV-2 has infected more than 600 million people. However, the origin of the virus is still unclear; knowing where the virus came from could help us prevent future zoonotic epidemics. Sequencing data, particularly metagenomic data, can profile the genomes of all species in the sample, including...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Xiao, Kan, Chuanwen, Ma, Wentai, Du, Zhenglin, Li, Mingkun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9927258/
https://www.ncbi.nlm.nih.gov/pubmed/36622170
http://dx.doi.org/10.1128/spectrum.03426-22
_version_ 1784888442381926400
author Sun, Xiao
Kan, Chuanwen
Ma, Wentai
Du, Zhenglin
Li, Mingkun
author_facet Sun, Xiao
Kan, Chuanwen
Ma, Wentai
Du, Zhenglin
Li, Mingkun
author_sort Sun, Xiao
collection PubMed
description SARS-CoV-2 has infected more than 600 million people. However, the origin of the virus is still unclear; knowing where the virus came from could help us prevent future zoonotic epidemics. Sequencing data, particularly metagenomic data, can profile the genomes of all species in the sample, including those not recognized at the time, thus allowing for the identification of the progenitor of SARS-CoV-2 in samples collected before the pandemic. We analyzed the data from 5,196 SARS-CoV-2-positive sequencing runs in the NCBI’s SRA database with collection dates prior to 2020 or unknown. We found that the mutation patterns obtained from these suspicious SARS-CoV-2 reads did not match the genome characteristics of an unknown progenitor of the virus, suggesting that they may derive from circulating SARS-CoV-2 variants or other coronaviruses. Despite a negative result for tracking the progenitor of SARS-CoV-2, the methods developed in the study could assist in pinpointing the origin of various pathogens in the future. IMPORTANCE Sequences that are homologous to the SARS-CoV-2 genome were found in numerous sequencing runs that were not associated with the SARS-CoV-2 studies in the public database. It is unclear whether they are derived from the possible progenitor of SARS-CoV-2 or contamination of more recent SARS-CoV-2 variants circulated in the population due to the lack of information on the collection, library preparation, and sequencing processes. We have developed a computational framework to infer the evolutionary relationship between sequences based on the comparison of mutations, which enabled us to rule out the possibility that these suspicious sequences originate from unknown progenitors of SARS-CoV-2.
format Online
Article
Text
id pubmed-9927258
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-99272582023-02-15 Genomic Analysis of the Suspicious SARS-CoV-2 Sequences in the Public Sequencing Database Sun, Xiao Kan, Chuanwen Ma, Wentai Du, Zhenglin Li, Mingkun Microbiol Spectr Research Article SARS-CoV-2 has infected more than 600 million people. However, the origin of the virus is still unclear; knowing where the virus came from could help us prevent future zoonotic epidemics. Sequencing data, particularly metagenomic data, can profile the genomes of all species in the sample, including those not recognized at the time, thus allowing for the identification of the progenitor of SARS-CoV-2 in samples collected before the pandemic. We analyzed the data from 5,196 SARS-CoV-2-positive sequencing runs in the NCBI’s SRA database with collection dates prior to 2020 or unknown. We found that the mutation patterns obtained from these suspicious SARS-CoV-2 reads did not match the genome characteristics of an unknown progenitor of the virus, suggesting that they may derive from circulating SARS-CoV-2 variants or other coronaviruses. Despite a negative result for tracking the progenitor of SARS-CoV-2, the methods developed in the study could assist in pinpointing the origin of various pathogens in the future. IMPORTANCE Sequences that are homologous to the SARS-CoV-2 genome were found in numerous sequencing runs that were not associated with the SARS-CoV-2 studies in the public database. It is unclear whether they are derived from the possible progenitor of SARS-CoV-2 or contamination of more recent SARS-CoV-2 variants circulated in the population due to the lack of information on the collection, library preparation, and sequencing processes. We have developed a computational framework to infer the evolutionary relationship between sequences based on the comparison of mutations, which enabled us to rule out the possibility that these suspicious sequences originate from unknown progenitors of SARS-CoV-2. American Society for Microbiology 2023-01-09 /pmc/articles/PMC9927258/ /pubmed/36622170 http://dx.doi.org/10.1128/spectrum.03426-22 Text en Copyright © 2023 Sun et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Sun, Xiao
Kan, Chuanwen
Ma, Wentai
Du, Zhenglin
Li, Mingkun
Genomic Analysis of the Suspicious SARS-CoV-2 Sequences in the Public Sequencing Database
title Genomic Analysis of the Suspicious SARS-CoV-2 Sequences in the Public Sequencing Database
title_full Genomic Analysis of the Suspicious SARS-CoV-2 Sequences in the Public Sequencing Database
title_fullStr Genomic Analysis of the Suspicious SARS-CoV-2 Sequences in the Public Sequencing Database
title_full_unstemmed Genomic Analysis of the Suspicious SARS-CoV-2 Sequences in the Public Sequencing Database
title_short Genomic Analysis of the Suspicious SARS-CoV-2 Sequences in the Public Sequencing Database
title_sort genomic analysis of the suspicious sars-cov-2 sequences in the public sequencing database
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9927258/
https://www.ncbi.nlm.nih.gov/pubmed/36622170
http://dx.doi.org/10.1128/spectrum.03426-22
work_keys_str_mv AT sunxiao genomicanalysisofthesuspicioussarscov2sequencesinthepublicsequencingdatabase
AT kanchuanwen genomicanalysisofthesuspicioussarscov2sequencesinthepublicsequencingdatabase
AT mawentai genomicanalysisofthesuspicioussarscov2sequencesinthepublicsequencingdatabase
AT duzhenglin genomicanalysisofthesuspicioussarscov2sequencesinthepublicsequencingdatabase
AT limingkun genomicanalysisofthesuspicioussarscov2sequencesinthepublicsequencingdatabase