Cargando…

Highly accurate whole-genome imputation of SARS-CoV-2 from partial or low-quality sequences

BACKGROUND: The current SARS-CoV-2 pandemic has emphasized the utility of viral whole-genome sequencing in the surveillance and control of the pathogen. An unprecedented ongoing global initiative is producing hundreds of thousands of sequences worldwide. However, the complex circumstances in which v...

Descripción completa

Detalles Bibliográficos
Autores principales: Ortuño, Francisco M, Loucera, Carlos, Casimiro-Soriguer, Carlos S, Lepe, Jose A, Camacho Martinez, Pedro, Merino Diaz, Laura, de Salazar, Adolfo, Chueca, Natalia, García, Federico, Perez-Florido, Javier, Dopazo, Joaquin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8643610/
https://www.ncbi.nlm.nih.gov/pubmed/34865008
http://dx.doi.org/10.1093/gigascience/giab078
_version_ 1784609893571887104
author Ortuño, Francisco M
Loucera, Carlos
Casimiro-Soriguer, Carlos S
Lepe, Jose A
Camacho Martinez, Pedro
Merino Diaz, Laura
de Salazar, Adolfo
Chueca, Natalia
García, Federico
Perez-Florido, Javier
Dopazo, Joaquin
author_facet Ortuño, Francisco M
Loucera, Carlos
Casimiro-Soriguer, Carlos S
Lepe, Jose A
Camacho Martinez, Pedro
Merino Diaz, Laura
de Salazar, Adolfo
Chueca, Natalia
García, Federico
Perez-Florido, Javier
Dopazo, Joaquin
author_sort Ortuño, Francisco M
collection PubMed
description BACKGROUND: The current SARS-CoV-2 pandemic has emphasized the utility of viral whole-genome sequencing in the surveillance and control of the pathogen. An unprecedented ongoing global initiative is producing hundreds of thousands of sequences worldwide. However, the complex circumstances in which viruses are sequenced, along with the demand of urgent results, causes a high rate of incomplete and, therefore, useless sequences. Viral sequences evolve in the context of a complex phylogeny and different positions along the genome are in linkage disequilibrium. Therefore, an imputation method would be able to predict missing positions from the available sequencing data. RESULTS: We have developed the impuSARS application, which takes advantage of the enormous number of SARS-CoV-2 genomes available, using a reference panel containing 239,301 sequences, to produce missing data imputation in viral genomes. ImpuSARS was tested in a wide range of conditions (continuous fragments, amplicons or sparse individual positions missing), showing great fidelity when reconstructing the original sequences, recovering the lineage with a 100% precision for almost all the lineages, even in very poorly covered genomes (<20%). CONCLUSIONS: Imputation can improve the pace of SARS-CoV-2 sequencing production by recovering many incomplete or low-quality sequences that would be otherwise discarded. ImpuSARS can be incorporated in any primary data processing pipeline for SARS-CoV-2 whole-genome sequencing.
format Online
Article
Text
id pubmed-8643610
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-86436102021-12-06 Highly accurate whole-genome imputation of SARS-CoV-2 from partial or low-quality sequences Ortuño, Francisco M Loucera, Carlos Casimiro-Soriguer, Carlos S Lepe, Jose A Camacho Martinez, Pedro Merino Diaz, Laura de Salazar, Adolfo Chueca, Natalia García, Federico Perez-Florido, Javier Dopazo, Joaquin Gigascience Technical Note BACKGROUND: The current SARS-CoV-2 pandemic has emphasized the utility of viral whole-genome sequencing in the surveillance and control of the pathogen. An unprecedented ongoing global initiative is producing hundreds of thousands of sequences worldwide. However, the complex circumstances in which viruses are sequenced, along with the demand of urgent results, causes a high rate of incomplete and, therefore, useless sequences. Viral sequences evolve in the context of a complex phylogeny and different positions along the genome are in linkage disequilibrium. Therefore, an imputation method would be able to predict missing positions from the available sequencing data. RESULTS: We have developed the impuSARS application, which takes advantage of the enormous number of SARS-CoV-2 genomes available, using a reference panel containing 239,301 sequences, to produce missing data imputation in viral genomes. ImpuSARS was tested in a wide range of conditions (continuous fragments, amplicons or sparse individual positions missing), showing great fidelity when reconstructing the original sequences, recovering the lineage with a 100% precision for almost all the lineages, even in very poorly covered genomes (<20%). CONCLUSIONS: Imputation can improve the pace of SARS-CoV-2 sequencing production by recovering many incomplete or low-quality sequences that would be otherwise discarded. ImpuSARS can be incorporated in any primary data processing pipeline for SARS-CoV-2 whole-genome sequencing. Oxford University Press 2021-12-02 /pmc/articles/PMC8643610/ /pubmed/34865008 http://dx.doi.org/10.1093/gigascience/giab078 Text en © The Author(s) 2021. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Ortuño, Francisco M
Loucera, Carlos
Casimiro-Soriguer, Carlos S
Lepe, Jose A
Camacho Martinez, Pedro
Merino Diaz, Laura
de Salazar, Adolfo
Chueca, Natalia
García, Federico
Perez-Florido, Javier
Dopazo, Joaquin
Highly accurate whole-genome imputation of SARS-CoV-2 from partial or low-quality sequences
title Highly accurate whole-genome imputation of SARS-CoV-2 from partial or low-quality sequences
title_full Highly accurate whole-genome imputation of SARS-CoV-2 from partial or low-quality sequences
title_fullStr Highly accurate whole-genome imputation of SARS-CoV-2 from partial or low-quality sequences
title_full_unstemmed Highly accurate whole-genome imputation of SARS-CoV-2 from partial or low-quality sequences
title_short Highly accurate whole-genome imputation of SARS-CoV-2 from partial or low-quality sequences
title_sort highly accurate whole-genome imputation of sars-cov-2 from partial or low-quality sequences
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8643610/
https://www.ncbi.nlm.nih.gov/pubmed/34865008
http://dx.doi.org/10.1093/gigascience/giab078
work_keys_str_mv AT ortunofranciscom highlyaccuratewholegenomeimputationofsarscov2frompartialorlowqualitysequences
AT louceracarlos highlyaccuratewholegenomeimputationofsarscov2frompartialorlowqualitysequences
AT casimirosoriguercarloss highlyaccuratewholegenomeimputationofsarscov2frompartialorlowqualitysequences
AT lepejosea highlyaccuratewholegenomeimputationofsarscov2frompartialorlowqualitysequences
AT camachomartinezpedro highlyaccuratewholegenomeimputationofsarscov2frompartialorlowqualitysequences
AT merinodiazlaura highlyaccuratewholegenomeimputationofsarscov2frompartialorlowqualitysequences
AT desalazaradolfo highlyaccuratewholegenomeimputationofsarscov2frompartialorlowqualitysequences
AT chuecanatalia highlyaccuratewholegenomeimputationofsarscov2frompartialorlowqualitysequences
AT garciafederico highlyaccuratewholegenomeimputationofsarscov2frompartialorlowqualitysequences
AT perezfloridojavier highlyaccuratewholegenomeimputationofsarscov2frompartialorlowqualitysequences
AT dopazojoaquin highlyaccuratewholegenomeimputationofsarscov2frompartialorlowqualitysequences