Cargando…

A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis

BACKGROUND: The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in...

Descripción completa

Detalles Bibliográficos
Autores principales: Mavian, Carla, Marini, Simone, Prosperi, Mattia, Salemi, Marco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7265655/
https://www.ncbi.nlm.nih.gov/pubmed/32412415
http://dx.doi.org/10.2196/19170
_version_ 1783541169504911360
author Mavian, Carla
Marini, Simone
Prosperi, Mattia
Salemi, Marco
author_facet Mavian, Carla
Marini, Simone
Prosperi, Mattia
Salemi, Marco
author_sort Mavian, Carla
collection PubMed
description BACKGROUND: The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in scientific journals and through non–peer-reviewed channels to investigate the genetic heterogeneity and spatiotemporal dissemination of SARS-CoV-2. However, a systematic investigation of phylogenetic information and sampling bias in the available data is lacking. Although the number of available genome sequences of SARS-CoV-2 is growing daily and the sequences show increasing phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution. OBJECTIVE: The objective of this study was to determine the quality of the currently available SARS-CoV-2 full genome data in terms of sampling bias as well as phylogenetic and temporal signals to inform and guide the scientific community. METHODS: We used maximum likelihood–based methods to assess the presence of sufficient information for robust phylogenetic and phylogeographic studies in several SARS-CoV-2 sequence alignments assembled from GISAID (Global Initiative on Sharing All Influenza Data) data released between March and April 2020. RESULTS: Although the number of high-quality full genomes is growing daily, and sequence data released in April 2020 contain sufficient phylogenetic information to allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 data sets still present severe limitations. CONCLUSIONS: At the present time, studies assessing within-country spread or transmission clusters should be considered preliminary or hypothesis-generating at best. Hence, current reports should be interpreted with caution, and concerted efforts should continue to increase the number and quality of sequences required for robust tracing of the epidemic.
format Online
Article
Text
id pubmed-7265655
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-72656552020-06-05 A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis Mavian, Carla Marini, Simone Prosperi, Mattia Salemi, Marco JMIR Public Health Surveill Original Paper BACKGROUND: The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in scientific journals and through non–peer-reviewed channels to investigate the genetic heterogeneity and spatiotemporal dissemination of SARS-CoV-2. However, a systematic investigation of phylogenetic information and sampling bias in the available data is lacking. Although the number of available genome sequences of SARS-CoV-2 is growing daily and the sequences show increasing phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution. OBJECTIVE: The objective of this study was to determine the quality of the currently available SARS-CoV-2 full genome data in terms of sampling bias as well as phylogenetic and temporal signals to inform and guide the scientific community. METHODS: We used maximum likelihood–based methods to assess the presence of sufficient information for robust phylogenetic and phylogeographic studies in several SARS-CoV-2 sequence alignments assembled from GISAID (Global Initiative on Sharing All Influenza Data) data released between March and April 2020. RESULTS: Although the number of high-quality full genomes is growing daily, and sequence data released in April 2020 contain sufficient phylogenetic information to allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 data sets still present severe limitations. CONCLUSIONS: At the present time, studies assessing within-country spread or transmission clusters should be considered preliminary or hypothesis-generating at best. Hence, current reports should be interpreted with caution, and concerted efforts should continue to increase the number and quality of sequences required for robust tracing of the epidemic. JMIR Publications 2020-06-01 /pmc/articles/PMC7265655/ /pubmed/32412415 http://dx.doi.org/10.2196/19170 Text en ©Carla Mavian, Simone Marini, Mattia Prosperi, Marco Salemi. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 01.06.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Mavian, Carla
Marini, Simone
Prosperi, Mattia
Salemi, Marco
A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis
title A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis
title_full A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis
title_fullStr A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis
title_full_unstemmed A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis
title_short A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis
title_sort snapshot of sars-cov-2 genome availability up to april 2020 and its implications: data analysis
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7265655/
https://www.ncbi.nlm.nih.gov/pubmed/32412415
http://dx.doi.org/10.2196/19170
work_keys_str_mv AT maviancarla asnapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis
AT marinisimone asnapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis
AT prosperimattia asnapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis
AT salemimarco asnapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis
AT maviancarla snapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis
AT marinisimone snapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis
AT prosperimattia snapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis
AT salemimarco snapshotofsarscov2genomeavailabilityuptoapril2020anditsimplicationsdataanalysis