Cargando…

Ultra-large alignments using phylogeny-aware profiles

Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very diffic...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Nam-phuong D., Mirarab, Siavash, Kumar, Keerthana, Warnow, Tandy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4492008/
https://www.ncbi.nlm.nih.gov/pubmed/26076734
http://dx.doi.org/10.1186/s13059-015-0688-z
_version_ 1782379720175779840
author Nguyen, Nam-phuong D.
Mirarab, Siavash
Kumar, Keerthana
Warnow, Tandy
author_facet Nguyen, Nam-phuong D.
Mirarab, Siavash
Kumar, Keerthana
Warnow, Tandy
author_sort Nguyen, Nam-phuong D.
collection PubMed
description Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0688-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4492008
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44920082015-07-07 Ultra-large alignments using phylogeny-aware profiles Nguyen, Nam-phuong D. Mirarab, Siavash Kumar, Keerthana Warnow, Tandy Genome Biol Method Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, a multiple sequence alignment method that uses a new machine learning technique, the ensemble of hidden Markov models, which we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at https://github.com/smirarab/sepp. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0688-z) contains supplementary material, which is available to authorized users. BioMed Central 2015-06-16 2015 /pmc/articles/PMC4492008/ /pubmed/26076734 http://dx.doi.org/10.1186/s13059-015-0688-z Text en © Nguyen et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Method
Nguyen, Nam-phuong D.
Mirarab, Siavash
Kumar, Keerthana
Warnow, Tandy
Ultra-large alignments using phylogeny-aware profiles
title Ultra-large alignments using phylogeny-aware profiles
title_full Ultra-large alignments using phylogeny-aware profiles
title_fullStr Ultra-large alignments using phylogeny-aware profiles
title_full_unstemmed Ultra-large alignments using phylogeny-aware profiles
title_short Ultra-large alignments using phylogeny-aware profiles
title_sort ultra-large alignments using phylogeny-aware profiles
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4492008/
https://www.ncbi.nlm.nih.gov/pubmed/26076734
http://dx.doi.org/10.1186/s13059-015-0688-z
work_keys_str_mv AT nguyennamphuongd ultralargealignmentsusingphylogenyawareprofiles
AT mirarabsiavash ultralargealignmentsusingphylogenyawareprofiles
AT kumarkeerthana ultralargealignmentsusingphylogenyawareprofiles
AT warnowtandy ultralargealignmentsusingphylogenyawareprofiles