Cargando…

Small-Sample Estimation of the Mutational Support and Distribution of SARS-CoV-2

We consider the problem of determining the mutational support and distribution of the SARS-CoV-2 viral genome in the small-sample regime. The mutational support refers to the unknown number of sites that may eventually mutate in the SARS-CoV-2 genome while mutational distribution refers to the distr...

Descripción completa

Detalles Bibliográficos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: IEEE 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10009811/
https://www.ncbi.nlm.nih.gov/pubmed/35385386
http://dx.doi.org/10.1109/TCBB.2022.3165395
Descripción
Sumario:We consider the problem of determining the mutational support and distribution of the SARS-CoV-2 viral genome in the small-sample regime. The mutational support refers to the unknown number of sites that may eventually mutate in the SARS-CoV-2 genome while mutational distribution refers to the distribution of point mutations in the viral genome across a population. The mutational support may be used to assess the virulence of the virus and guide primer selection for real-time RT-PCR testing. Estimating the distribution of mutations in the genome of different subpopulations while accounting for the unseen may also aid in discovering new variants. To estimate the mutational support in the small-sample regime, we use GISAID sequencing data and our state-of-the-art polynomial estimation techniques based on new weighted and regularized Chebyshev approximation methods. For distribution estimation, we adapt the well-known Good-Turing estimator. Our analysis reveals several findings: First, the mutational supports exhibit significant differences in the ORF6 and ORF7a regions (older versus younger patients), ORF1b and ORF10 regions (females versus males) and in almost all ORFs (Asia/Europe/North America). Second, even though the N region of SARS-CoV-2 has a predicted [Formula: see text][Formula: see text] mutational support, mutations fall outside of the primer regions recommended by the CDC.