Cargando…
COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM
MOTIVATION: The first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1000 and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically....
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7745650/ https://www.ncbi.nlm.nih.gov/pubmed/33045068 http://dx.doi.org/10.1093/bioinformatics/btaa871 |
_version_ | 1783624649297362944 |
---|---|
author | Lemoine, Frédéric Blassel, Luc Voznica, Jakub Gascuel, Olivier |
author_facet | Lemoine, Frédéric Blassel, Luc Voznica, Jakub Gascuel, Olivier |
author_sort | Lemoine, Frédéric |
collection | PubMed |
description | MOTIVATION: The first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1000 and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically. Moreover, some genomes are of low quality with sequencing/assembly errors, making accurate re-alignment of all genomes nearly impossible on a daily basis. A more efficient, yet accurate approach was clearly required to pursue all subsequent bioinformatics analyses of this crucial data. RESULTS: hCoV-19 genomes are highly conserved, with very few indels and no recombination. This makes the profile HMM approach particularly well suited to align new genomes, add them to an existing alignment and filter problematic ones. Using a core of ∼2500 high quality genomes, we estimated a profile using HMMER, and implemented this profile in COVID-Align, a user-friendly interface to be used online or as standalone via Docker. The alignment of 1000 genomes requires ∼50 minutes on our cluster. Moreover, COVID-Align provides summary statistics, which can be used to determine the sequencing quality and evolutionary novelty of input genomes (e.g. number of new mutations and indels). AVAILABILITY AND IMPLEMENTATION: https://covalign.pasteur.cloud, hub.docker.com/r/evolbioinfo/covid-align. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-7745650 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-77456502020-12-21 COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM Lemoine, Frédéric Blassel, Luc Voznica, Jakub Gascuel, Olivier Bioinformatics Applications Notes MOTIVATION: The first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1000 and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically. Moreover, some genomes are of low quality with sequencing/assembly errors, making accurate re-alignment of all genomes nearly impossible on a daily basis. A more efficient, yet accurate approach was clearly required to pursue all subsequent bioinformatics analyses of this crucial data. RESULTS: hCoV-19 genomes are highly conserved, with very few indels and no recombination. This makes the profile HMM approach particularly well suited to align new genomes, add them to an existing alignment and filter problematic ones. Using a core of ∼2500 high quality genomes, we estimated a profile using HMMER, and implemented this profile in COVID-Align, a user-friendly interface to be used online or as standalone via Docker. The alignment of 1000 genomes requires ∼50 minutes on our cluster. Moreover, COVID-Align provides summary statistics, which can be used to determine the sequencing quality and evolutionary novelty of input genomes (e.g. number of new mutations and indels). AVAILABILITY AND IMPLEMENTATION: https://covalign.pasteur.cloud, hub.docker.com/r/evolbioinfo/covid-align. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-10-12 /pmc/articles/PMC7745650/ /pubmed/33045068 http://dx.doi.org/10.1093/bioinformatics/btaa871 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Applications Notes Lemoine, Frédéric Blassel, Luc Voznica, Jakub Gascuel, Olivier COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM |
title | COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM |
title_full | COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM |
title_fullStr | COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM |
title_full_unstemmed | COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM |
title_short | COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM |
title_sort | covid-align: accurate online alignment of hcov-19 genomes using a profile hmm |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7745650/ https://www.ncbi.nlm.nih.gov/pubmed/33045068 http://dx.doi.org/10.1093/bioinformatics/btaa871 |
work_keys_str_mv | AT lemoinefrederic covidalignaccurateonlinealignmentofhcov19genomesusingaprofilehmm AT blasselluc covidalignaccurateonlinealignmentofhcov19genomesusingaprofilehmm AT voznicajakub covidalignaccurateonlinealignmentofhcov19genomesusingaprofilehmm AT gascuelolivier covidalignaccurateonlinealignmentofhcov19genomesusingaprofilehmm |