Cargando…

WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity

SUMMARY: Multiple sequence alignment is a basic part of many bioinformatics pipelines, including in phylogeny estimation, prediction of structure for both RNAs and proteins, and metagenomic sequence analysis. Yet many sequence datasets exhibit substantial sequence length heterogeneity, both because...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Baqiao, Warnow, Tandy
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10035637/ https://www.ncbi.nlm.nih.gov/pubmed/36970502 http://dx.doi.org/10.1093/bioadv/vbad024

_version_	1784911451077476352
author	Liu, Baqiao Warnow, Tandy
author_facet	Liu, Baqiao Warnow, Tandy
author_sort	Liu, Baqiao
collection	PubMed
description	SUMMARY: Multiple sequence alignment is a basic part of many bioinformatics pipelines, including in phylogeny estimation, prediction of structure for both RNAs and proteins, and metagenomic sequence analysis. Yet many sequence datasets exhibit substantial sequence length heterogeneity, both because of large insertions and deletions in the evolutionary history of the sequences and the inclusion of unassembled reads or incompletely assembled sequences in the input. A few methods have been developed that can be highly accurate in aligning datasets with sequence length heterogeneity, with UPP one of the first methods to achieve good accuracy, and WITCH a recent improvement on UPP for accuracy. In this article, we show how we can speed up WITCH. Our improvement includes replacing a critical step in WITCH (currently performed using a heuristic search) by a polynomial time exact algorithm using Smith–Waterman. Our new method, WITCH-NG (i.e. ‘next generation WITCH’) achieves the same accuracy but is substantially faster. WITCH-NG is available at https://github.com/RuneBlaze/WITCH-NG. AVAILABILITY AND IMPLEMENTATION: The datasets used in this study are from prior publications and are freely available in public repositories, as indicated in the Supplementary Materials. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format	Online Article Text
id	pubmed-10035637
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-100356372023-03-24 WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity Liu, Baqiao Warnow, Tandy Bioinform Adv Original Paper SUMMARY: Multiple sequence alignment is a basic part of many bioinformatics pipelines, including in phylogeny estimation, prediction of structure for both RNAs and proteins, and metagenomic sequence analysis. Yet many sequence datasets exhibit substantial sequence length heterogeneity, both because of large insertions and deletions in the evolutionary history of the sequences and the inclusion of unassembled reads or incompletely assembled sequences in the input. A few methods have been developed that can be highly accurate in aligning datasets with sequence length heterogeneity, with UPP one of the first methods to achieve good accuracy, and WITCH a recent improvement on UPP for accuracy. In this article, we show how we can speed up WITCH. Our improvement includes replacing a critical step in WITCH (currently performed using a heuristic search) by a polynomial time exact algorithm using Smith–Waterman. Our new method, WITCH-NG (i.e. ‘next generation WITCH’) achieves the same accuracy but is substantially faster. WITCH-NG is available at https://github.com/RuneBlaze/WITCH-NG. AVAILABILITY AND IMPLEMENTATION: The datasets used in this study are from prior publications and are freely available in public repositories, as indicated in the Supplementary Materials. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2023-03-06 /pmc/articles/PMC10035637/ /pubmed/36970502 http://dx.doi.org/10.1093/bioadv/vbad024 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Paper Liu, Baqiao Warnow, Tandy WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity
title	WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity
title_full	WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity
title_fullStr	WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity
title_full_unstemmed	WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity
title_short	WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity
title_sort	witch-ng: efficient and accurate alignment of datasets with sequence length heterogeneity
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10035637/ https://www.ncbi.nlm.nih.gov/pubmed/36970502 http://dx.doi.org/10.1093/bioadv/vbad024
work_keys_str_mv	AT liubaqiao witchngefficientandaccuratealignmentofdatasetswithsequencelengthheterogeneity AT warnowtandy witchngefficientandaccuratealignmentofdatasetswithsequencelengthheterogeneity

WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity

Ejemplares similares