Cargando…

MAGUS: Multiple sequence Alignment using Graph clUStering

MOTIVATION: The estimation of large multiple sequence alignments (MSAs) is a basic bioinformatics challenge. Divide-and-conquer is a useful approach that has been shown to improve the scalability and accuracy of MSA estimation in established methods such as SATé and PASTA. In these divide-and-conque...

Descripción completa

Detalles Bibliográficos
Autores principales:	Smirnov, Vladimir, Warnow, Tandy
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8289385/ https://www.ncbi.nlm.nih.gov/pubmed/33252662 http://dx.doi.org/10.1093/bioinformatics/btaa992

_version_	1783724290164654080
author	Smirnov, Vladimir Warnow, Tandy
author_facet	Smirnov, Vladimir Warnow, Tandy
author_sort	Smirnov, Vladimir
collection	PubMed
description	MOTIVATION: The estimation of large multiple sequence alignments (MSAs) is a basic bioinformatics challenge. Divide-and-conquer is a useful approach that has been shown to improve the scalability and accuracy of MSA estimation in established methods such as SATé and PASTA. In these divide-and-conquer strategies, a sequence dataset is divided into disjoint subsets, alignments are computed on the subsets using base MSA methods (e.g. MAFFT), and then merged together into an alignment on the full dataset. RESULTS: We present MAGUS, Multiple sequence Alignment using Graph clUStering, a new technique for computing large-scale alignments. MAGUS is similar to PASTA in that it uses nearly the same initial steps (starting tree, similar decomposition strategy, and MAFFT to compute subset alignments), but then merges the subset alignments using the Graph Clustering Merger, a new method for combining disjoint alignments that we present in this study. Our study, on a heterogeneous collection of biological and simulated datasets, shows that MAGUS produces improved accuracy and is faster than PASTA on large datasets, and matches it on smaller datasets. AVAILABILITY AND IMPLEMENTATION: MAGUS: https://github.com/vlasmirnov/MAGUS SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-8289385
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-82893852021-07-20 MAGUS: Multiple sequence Alignment using Graph clUStering Smirnov, Vladimir Warnow, Tandy Bioinformatics Original Papers MOTIVATION: The estimation of large multiple sequence alignments (MSAs) is a basic bioinformatics challenge. Divide-and-conquer is a useful approach that has been shown to improve the scalability and accuracy of MSA estimation in established methods such as SATé and PASTA. In these divide-and-conquer strategies, a sequence dataset is divided into disjoint subsets, alignments are computed on the subsets using base MSA methods (e.g. MAFFT), and then merged together into an alignment on the full dataset. RESULTS: We present MAGUS, Multiple sequence Alignment using Graph clUStering, a new technique for computing large-scale alignments. MAGUS is similar to PASTA in that it uses nearly the same initial steps (starting tree, similar decomposition strategy, and MAFFT to compute subset alignments), but then merges the subset alignments using the Graph Clustering Merger, a new method for combining disjoint alignments that we present in this study. Our study, on a heterogeneous collection of biological and simulated datasets, shows that MAGUS produces improved accuracy and is faster than PASTA on large datasets, and matches it on smaller datasets. AVAILABILITY AND IMPLEMENTATION: MAGUS: https://github.com/vlasmirnov/MAGUS SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-11-30 /pmc/articles/PMC8289385/ /pubmed/33252662 http://dx.doi.org/10.1093/bioinformatics/btaa992 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Original Papers Smirnov, Vladimir Warnow, Tandy MAGUS: Multiple sequence Alignment using Graph clUStering
title	MAGUS: Multiple sequence Alignment using Graph clUStering
title_full	MAGUS: Multiple sequence Alignment using Graph clUStering
title_fullStr	MAGUS: Multiple sequence Alignment using Graph clUStering
title_full_unstemmed	MAGUS: Multiple sequence Alignment using Graph clUStering
title_short	MAGUS: Multiple sequence Alignment using Graph clUStering
title_sort	magus: multiple sequence alignment using graph clustering
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8289385/ https://www.ncbi.nlm.nih.gov/pubmed/33252662 http://dx.doi.org/10.1093/bioinformatics/btaa992
work_keys_str_mv	AT smirnovvladimir magusmultiplesequencealignmentusinggraphclustering AT warnowtandy magusmultiplesequencealignmentusinggraphclustering

MAGUS: Multiple sequence Alignment using Graph clUStering

Ejemplares similares