Cargando…
MAGUS+eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences
SUMMARY: Multiple sequence alignment is an initial step in many bioinformatics pipelines, including phylogeny estimation, protein structure prediction and taxonomic identification of reads produced in amplicon or metagenomic datasets, etc. Yet, alignment estimation is challenging on datasets that ex...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8796358/ https://www.ncbi.nlm.nih.gov/pubmed/34791036 http://dx.doi.org/10.1093/bioinformatics/btab788 |
_version_ | 1784641287687766016 |
---|---|
author | Shen, Chengze Zaharias, Paul Warnow, Tandy |
author_facet | Shen, Chengze Zaharias, Paul Warnow, Tandy |
author_sort | Shen, Chengze |
collection | PubMed |
description | SUMMARY: Multiple sequence alignment is an initial step in many bioinformatics pipelines, including phylogeny estimation, protein structure prediction and taxonomic identification of reads produced in amplicon or metagenomic datasets, etc. Yet, alignment estimation is challenging on datasets that exhibit substantial sequence length heterogeneity, and especially when the datasets have fragmentary sequences as a result of including reads or contigs generated by next-generation sequencing technologies. Here, we examine techniques that have been developed to improve alignment estimation when datasets contain substantial numbers of fragmentary sequences. We find that MAGUS, a recently developed MSA method, is fairly robust to fragmentary sequences under many conditions, and that using a two-stage approach where MAGUS is used to align selected ‘backbone sequences’ and the remaining sequences are added into the alignment using ensembles of Hidden Markov Models further improves alignment accuracy. The combination of MAGUS with the ensemble of eHMMs (i.e. MAGUS+eHMMs) clearly improves on UPP, the previous leading method for aligning datasets with high levels of fragmentation. AVAILABILITY AND IMPLEMENTATION: UPP is available on https://github.com/smirarab/sepp, and MAGUS is available on https://github.com/vlasmirnov/MAGUS. MAGUS+eHMMs can be performed by running MAGUS to obtain the backbone alignment, and then using the backbone alignment as an input to UPP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-8796358 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-87963582022-01-31 MAGUS+eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences Shen, Chengze Zaharias, Paul Warnow, Tandy Bioinformatics Original Papers SUMMARY: Multiple sequence alignment is an initial step in many bioinformatics pipelines, including phylogeny estimation, protein structure prediction and taxonomic identification of reads produced in amplicon or metagenomic datasets, etc. Yet, alignment estimation is challenging on datasets that exhibit substantial sequence length heterogeneity, and especially when the datasets have fragmentary sequences as a result of including reads or contigs generated by next-generation sequencing technologies. Here, we examine techniques that have been developed to improve alignment estimation when datasets contain substantial numbers of fragmentary sequences. We find that MAGUS, a recently developed MSA method, is fairly robust to fragmentary sequences under many conditions, and that using a two-stage approach where MAGUS is used to align selected ‘backbone sequences’ and the remaining sequences are added into the alignment using ensembles of Hidden Markov Models further improves alignment accuracy. The combination of MAGUS with the ensemble of eHMMs (i.e. MAGUS+eHMMs) clearly improves on UPP, the previous leading method for aligning datasets with high levels of fragmentation. AVAILABILITY AND IMPLEMENTATION: UPP is available on https://github.com/smirarab/sepp, and MAGUS is available on https://github.com/vlasmirnov/MAGUS. MAGUS+eHMMs can be performed by running MAGUS to obtain the backbone alignment, and then using the backbone alignment as an input to UPP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-11-17 /pmc/articles/PMC8796358/ /pubmed/34791036 http://dx.doi.org/10.1093/bioinformatics/btab788 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Shen, Chengze Zaharias, Paul Warnow, Tandy MAGUS+eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences |
title | MAGUS+eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences |
title_full | MAGUS+eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences |
title_fullStr | MAGUS+eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences |
title_full_unstemmed | MAGUS+eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences |
title_short | MAGUS+eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences |
title_sort | magus+ehmms: improved multiple sequence alignment accuracy for fragmentary sequences |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8796358/ https://www.ncbi.nlm.nih.gov/pubmed/34791036 http://dx.doi.org/10.1093/bioinformatics/btab788 |
work_keys_str_mv | AT shenchengze magusehmmsimprovedmultiplesequencealignmentaccuracyforfragmentarysequences AT zahariaspaul magusehmmsimprovedmultiplesequencealignmentaccuracyforfragmentarysequences AT warnowtandy magusehmmsimprovedmultiplesequencealignmentaccuracyforfragmentarysequences |