Cargando…

Large multiple sequence alignments with a root-to-leaf regressive method

Multiple sequence alignments (MSAs) are used for structural(1,2) and evolutionary predictions(1,2), but the complexity of aligning large datasets requires the use of approximate solutions(3), including the progressive algorithm(4). Progressive MSA methods start by aligning the most similar sequences...

Descripción completa

Detalles Bibliográficos
Autores principales: Garriga, Edgar, Di Tommaso, Paolo, Magis, Cedrik, Erb, Ionas, Mansouri, Leila, Baltzis, Athanasios, Laayouni, Hafid, Kondrashov, Fyodor, Floden, Evan, Notredame, Cedric
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6894943/
https://www.ncbi.nlm.nih.gov/pubmed/31792410
http://dx.doi.org/10.1038/s41587-019-0333-6
_version_ 1783476491413094400
author Garriga, Edgar
Di Tommaso, Paolo
Magis, Cedrik
Erb, Ionas
Mansouri, Leila
Baltzis, Athanasios
Laayouni, Hafid
Kondrashov, Fyodor
Floden, Evan
Notredame, Cedric
author_facet Garriga, Edgar
Di Tommaso, Paolo
Magis, Cedrik
Erb, Ionas
Mansouri, Leila
Baltzis, Athanasios
Laayouni, Hafid
Kondrashov, Fyodor
Floden, Evan
Notredame, Cedric
author_sort Garriga, Edgar
collection PubMed
description Multiple sequence alignments (MSAs) are used for structural(1,2) and evolutionary predictions(1,2), but the complexity of aligning large datasets requires the use of approximate solutions(3), including the progressive algorithm(4). Progressive MSA methods start by aligning the most similar sequences and subsequently incorporate the remaining sequences, from leaf-to-root, based on a guide-tree. Their accuracy declines substantially as the number of sequences is scaled up(5). We introduce a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard workstation and substantially improves accuracy on datasets larger than 10,000 sequences. Our regressive algorithm works the other way around to the progressive algorithm and begins by aligning the most dissimilar sequences. It uses an efficient divide-and-conquer strategy to run third-party alignment methods in linear time, regardless of their original complexity. Our approach will enable analyses of extremely large genomic datasets such as the recently announced Earth BioGenome Project, which comprises 1.5 million eukaryotic genomes(6).
format Online
Article
Text
id pubmed-6894943
institution National Center for Biotechnology Information
language English
publishDate 2019
record_format MEDLINE/PubMed
spelling pubmed-68949432019-12-05 Large multiple sequence alignments with a root-to-leaf regressive method Garriga, Edgar Di Tommaso, Paolo Magis, Cedrik Erb, Ionas Mansouri, Leila Baltzis, Athanasios Laayouni, Hafid Kondrashov, Fyodor Floden, Evan Notredame, Cedric Nat Biotechnol Article Multiple sequence alignments (MSAs) are used for structural(1,2) and evolutionary predictions(1,2), but the complexity of aligning large datasets requires the use of approximate solutions(3), including the progressive algorithm(4). Progressive MSA methods start by aligning the most similar sequences and subsequently incorporate the remaining sequences, from leaf-to-root, based on a guide-tree. Their accuracy declines substantially as the number of sequences is scaled up(5). We introduce a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard workstation and substantially improves accuracy on datasets larger than 10,000 sequences. Our regressive algorithm works the other way around to the progressive algorithm and begins by aligning the most dissimilar sequences. It uses an efficient divide-and-conquer strategy to run third-party alignment methods in linear time, regardless of their original complexity. Our approach will enable analyses of extremely large genomic datasets such as the recently announced Earth BioGenome Project, which comprises 1.5 million eukaryotic genomes(6). 2019-12-02 2019-12 /pmc/articles/PMC6894943/ /pubmed/31792410 http://dx.doi.org/10.1038/s41587-019-0333-6 Text en http://www.nature.com/authors/editorial_policies/license.html#terms Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms
spellingShingle Article
Garriga, Edgar
Di Tommaso, Paolo
Magis, Cedrik
Erb, Ionas
Mansouri, Leila
Baltzis, Athanasios
Laayouni, Hafid
Kondrashov, Fyodor
Floden, Evan
Notredame, Cedric
Large multiple sequence alignments with a root-to-leaf regressive method
title Large multiple sequence alignments with a root-to-leaf regressive method
title_full Large multiple sequence alignments with a root-to-leaf regressive method
title_fullStr Large multiple sequence alignments with a root-to-leaf regressive method
title_full_unstemmed Large multiple sequence alignments with a root-to-leaf regressive method
title_short Large multiple sequence alignments with a root-to-leaf regressive method
title_sort large multiple sequence alignments with a root-to-leaf regressive method
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6894943/
https://www.ncbi.nlm.nih.gov/pubmed/31792410
http://dx.doi.org/10.1038/s41587-019-0333-6
work_keys_str_mv AT garrigaedgar largemultiplesequencealignmentswitharoottoleafregressivemethod
AT ditommasopaolo largemultiplesequencealignmentswitharoottoleafregressivemethod
AT magiscedrik largemultiplesequencealignmentswitharoottoleafregressivemethod
AT erbionas largemultiplesequencealignmentswitharoottoleafregressivemethod
AT mansourileila largemultiplesequencealignmentswitharoottoleafregressivemethod
AT baltzisathanasios largemultiplesequencealignmentswitharoottoleafregressivemethod
AT laayounihafid largemultiplesequencealignmentswitharoottoleafregressivemethod
AT kondrashovfyodor largemultiplesequencealignmentswitharoottoleafregressivemethod
AT flodenevan largemultiplesequencealignmentswitharoottoleafregressivemethod
AT notredamecedric largemultiplesequencealignmentswitharoottoleafregressivemethod