Cargando…

COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM

MOTIVATION: The first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1000 and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically....

Descripción completa

Detalles Bibliográficos
Autores principales: Lemoine, Frédéric, Blassel, Luc, Voznica, Jakub, Gascuel, Olivier
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7745650/
https://www.ncbi.nlm.nih.gov/pubmed/33045068
http://dx.doi.org/10.1093/bioinformatics/btaa871
_version_ 1783624649297362944
author Lemoine, Frédéric
Blassel, Luc
Voznica, Jakub
Gascuel, Olivier
author_facet Lemoine, Frédéric
Blassel, Luc
Voznica, Jakub
Gascuel, Olivier
author_sort Lemoine, Frédéric
collection PubMed
description MOTIVATION: The first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1000 and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically. Moreover, some genomes are of low quality with sequencing/assembly errors, making accurate re-alignment of all genomes nearly impossible on a daily basis. A more efficient, yet accurate approach was clearly required to pursue all subsequent bioinformatics analyses of this crucial data. RESULTS: hCoV-19 genomes are highly conserved, with very few indels and no recombination. This makes the profile HMM approach particularly well suited to align new genomes, add them to an existing alignment and filter problematic ones. Using a core of ∼2500 high quality genomes, we estimated a profile using HMMER, and implemented this profile in COVID-Align, a user-friendly interface to be used online or as standalone via Docker. The alignment of 1000 genomes requires ∼50 minutes on our cluster. Moreover, COVID-Align provides summary statistics, which can be used to determine the sequencing quality and evolutionary novelty of input genomes (e.g. number of new mutations and indels). AVAILABILITY AND IMPLEMENTATION: https://covalign.pasteur.cloud, hub.docker.com/r/evolbioinfo/covid-align. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-7745650
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-77456502020-12-21 COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM Lemoine, Frédéric Blassel, Luc Voznica, Jakub Gascuel, Olivier Bioinformatics Applications Notes MOTIVATION: The first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1000 and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically. Moreover, some genomes are of low quality with sequencing/assembly errors, making accurate re-alignment of all genomes nearly impossible on a daily basis. A more efficient, yet accurate approach was clearly required to pursue all subsequent bioinformatics analyses of this crucial data. RESULTS: hCoV-19 genomes are highly conserved, with very few indels and no recombination. This makes the profile HMM approach particularly well suited to align new genomes, add them to an existing alignment and filter problematic ones. Using a core of ∼2500 high quality genomes, we estimated a profile using HMMER, and implemented this profile in COVID-Align, a user-friendly interface to be used online or as standalone via Docker. The alignment of 1000 genomes requires ∼50 minutes on our cluster. Moreover, COVID-Align provides summary statistics, which can be used to determine the sequencing quality and evolutionary novelty of input genomes (e.g. number of new mutations and indels). AVAILABILITY AND IMPLEMENTATION: https://covalign.pasteur.cloud, hub.docker.com/r/evolbioinfo/covid-align. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-10-12 /pmc/articles/PMC7745650/ /pubmed/33045068 http://dx.doi.org/10.1093/bioinformatics/btaa871 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Applications Notes
Lemoine, Frédéric
Blassel, Luc
Voznica, Jakub
Gascuel, Olivier
COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM
title COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM
title_full COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM
title_fullStr COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM
title_full_unstemmed COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM
title_short COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM
title_sort covid-align: accurate online alignment of hcov-19 genomes using a profile hmm
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7745650/
https://www.ncbi.nlm.nih.gov/pubmed/33045068
http://dx.doi.org/10.1093/bioinformatics/btaa871
work_keys_str_mv AT lemoinefrederic covidalignaccurateonlinealignmentofhcov19genomesusingaprofilehmm
AT blasselluc covidalignaccurateonlinealignmentofhcov19genomesusingaprofilehmm
AT voznicajakub covidalignaccurateonlinealignmentofhcov19genomesusingaprofilehmm
AT gascuelolivier covidalignaccurateonlinealignmentofhcov19genomesusingaprofilehmm