Cargando…
Large multiple sequence alignments with a root-to-leaf regressive method
Multiple sequence alignments (MSAs) are used for structural(1,2) and evolutionary predictions(1,2), but the complexity of aligning large datasets requires the use of approximate solutions(3), including the progressive algorithm(4). Progressive MSA methods start by aligning the most similar sequences...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6894943/ https://www.ncbi.nlm.nih.gov/pubmed/31792410 http://dx.doi.org/10.1038/s41587-019-0333-6 |
_version_ | 1783476491413094400 |
---|---|
author | Garriga, Edgar Di Tommaso, Paolo Magis, Cedrik Erb, Ionas Mansouri, Leila Baltzis, Athanasios Laayouni, Hafid Kondrashov, Fyodor Floden, Evan Notredame, Cedric |
author_facet | Garriga, Edgar Di Tommaso, Paolo Magis, Cedrik Erb, Ionas Mansouri, Leila Baltzis, Athanasios Laayouni, Hafid Kondrashov, Fyodor Floden, Evan Notredame, Cedric |
author_sort | Garriga, Edgar |
collection | PubMed |
description | Multiple sequence alignments (MSAs) are used for structural(1,2) and evolutionary predictions(1,2), but the complexity of aligning large datasets requires the use of approximate solutions(3), including the progressive algorithm(4). Progressive MSA methods start by aligning the most similar sequences and subsequently incorporate the remaining sequences, from leaf-to-root, based on a guide-tree. Their accuracy declines substantially as the number of sequences is scaled up(5). We introduce a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard workstation and substantially improves accuracy on datasets larger than 10,000 sequences. Our regressive algorithm works the other way around to the progressive algorithm and begins by aligning the most dissimilar sequences. It uses an efficient divide-and-conquer strategy to run third-party alignment methods in linear time, regardless of their original complexity. Our approach will enable analyses of extremely large genomic datasets such as the recently announced Earth BioGenome Project, which comprises 1.5 million eukaryotic genomes(6). |
format | Online Article Text |
id | pubmed-6894943 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
record_format | MEDLINE/PubMed |
spelling | pubmed-68949432019-12-05 Large multiple sequence alignments with a root-to-leaf regressive method Garriga, Edgar Di Tommaso, Paolo Magis, Cedrik Erb, Ionas Mansouri, Leila Baltzis, Athanasios Laayouni, Hafid Kondrashov, Fyodor Floden, Evan Notredame, Cedric Nat Biotechnol Article Multiple sequence alignments (MSAs) are used for structural(1,2) and evolutionary predictions(1,2), but the complexity of aligning large datasets requires the use of approximate solutions(3), including the progressive algorithm(4). Progressive MSA methods start by aligning the most similar sequences and subsequently incorporate the remaining sequences, from leaf-to-root, based on a guide-tree. Their accuracy declines substantially as the number of sequences is scaled up(5). We introduce a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard workstation and substantially improves accuracy on datasets larger than 10,000 sequences. Our regressive algorithm works the other way around to the progressive algorithm and begins by aligning the most dissimilar sequences. It uses an efficient divide-and-conquer strategy to run third-party alignment methods in linear time, regardless of their original complexity. Our approach will enable analyses of extremely large genomic datasets such as the recently announced Earth BioGenome Project, which comprises 1.5 million eukaryotic genomes(6). 2019-12-02 2019-12 /pmc/articles/PMC6894943/ /pubmed/31792410 http://dx.doi.org/10.1038/s41587-019-0333-6 Text en http://www.nature.com/authors/editorial_policies/license.html#terms Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms |
spellingShingle | Article Garriga, Edgar Di Tommaso, Paolo Magis, Cedrik Erb, Ionas Mansouri, Leila Baltzis, Athanasios Laayouni, Hafid Kondrashov, Fyodor Floden, Evan Notredame, Cedric Large multiple sequence alignments with a root-to-leaf regressive method |
title | Large multiple sequence alignments with a root-to-leaf regressive method |
title_full | Large multiple sequence alignments with a root-to-leaf regressive method |
title_fullStr | Large multiple sequence alignments with a root-to-leaf regressive method |
title_full_unstemmed | Large multiple sequence alignments with a root-to-leaf regressive method |
title_short | Large multiple sequence alignments with a root-to-leaf regressive method |
title_sort | large multiple sequence alignments with a root-to-leaf regressive method |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6894943/ https://www.ncbi.nlm.nih.gov/pubmed/31792410 http://dx.doi.org/10.1038/s41587-019-0333-6 |
work_keys_str_mv | AT garrigaedgar largemultiplesequencealignmentswitharoottoleafregressivemethod AT ditommasopaolo largemultiplesequencealignmentswitharoottoleafregressivemethod AT magiscedrik largemultiplesequencealignmentswitharoottoleafregressivemethod AT erbionas largemultiplesequencealignmentswitharoottoleafregressivemethod AT mansourileila largemultiplesequencealignmentswitharoottoleafregressivemethod AT baltzisathanasios largemultiplesequencealignmentswitharoottoleafregressivemethod AT laayounihafid largemultiplesequencealignmentswitharoottoleafregressivemethod AT kondrashovfyodor largemultiplesequencealignmentswitharoottoleafregressivemethod AT flodenevan largemultiplesequencealignmentswitharoottoleafregressivemethod AT notredamecedric largemultiplesequencealignmentswitharoottoleafregressivemethod |