Cargando…

PASTA for proteins

SUMMARY: PASTA is a multiple sequence method that uses divide-and-conquer plus iteration to enable base alignment methods to scale with high accuracy to large sequence datasets. By default, PASTA included MAFFT L-INS-i; our new extension of PASTA enables the use of MAFFT G-INS-i, MAFFT Homologs, CON...

Descripción completa

Detalles Bibliográficos
Autores principales:	Collins, Kodi, Warnow, Tandy
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2018
Materias:	Applications Notes
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6223367/ https://www.ncbi.nlm.nih.gov/pubmed/29931282 http://dx.doi.org/10.1093/bioinformatics/bty495

_version_	1783369389327777792
author	Collins, Kodi Warnow, Tandy
author_facet	Collins, Kodi Warnow, Tandy
author_sort	Collins, Kodi
collection	PubMed
description	SUMMARY: PASTA is a multiple sequence method that uses divide-and-conquer plus iteration to enable base alignment methods to scale with high accuracy to large sequence datasets. By default, PASTA included MAFFT L-INS-i; our new extension of PASTA enables the use of MAFFT G-INS-i, MAFFT Homologs, CONTRAlign and ProbCons. We analyzed the performance of each base method and PASTA using these base methods on 224 datasets from BAliBASE 4 with at least 50 sequences. We show that PASTA enables the most accurate base methods to scale to larger datasets at reduced computational effort, and generally improves alignment and tree accuracy on the largest BAliBASE datasets. AVAILABILITY AND IMPLEMENTATION: PASTA is available at https://github.com/kodicollins/pasta and has also been integrated into the original PASTA repository at https://github.com/smirarab/pasta. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-6223367
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-62233672018-11-14 PASTA for proteins Collins, Kodi Warnow, Tandy Bioinformatics Applications Notes SUMMARY: PASTA is a multiple sequence method that uses divide-and-conquer plus iteration to enable base alignment methods to scale with high accuracy to large sequence datasets. By default, PASTA included MAFFT L-INS-i; our new extension of PASTA enables the use of MAFFT G-INS-i, MAFFT Homologs, CONTRAlign and ProbCons. We analyzed the performance of each base method and PASTA using these base methods on 224 datasets from BAliBASE 4 with at least 50 sequences. We show that PASTA enables the most accurate base methods to scale to larger datasets at reduced computational effort, and generally improves alignment and tree accuracy on the largest BAliBASE datasets. AVAILABILITY AND IMPLEMENTATION: PASTA is available at https://github.com/kodicollins/pasta and has also been integrated into the original PASTA repository at https://github.com/smirarab/pasta. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-11-15 2018-06-19 /pmc/articles/PMC6223367/ /pubmed/29931282 http://dx.doi.org/10.1093/bioinformatics/bty495 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Applications Notes Collins, Kodi Warnow, Tandy PASTA for proteins
title	PASTA for proteins
title_full	PASTA for proteins
title_fullStr	PASTA for proteins
title_full_unstemmed	PASTA for proteins
title_short	PASTA for proteins
title_sort	pasta for proteins
topic	Applications Notes
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6223367/ https://www.ncbi.nlm.nih.gov/pubmed/29931282 http://dx.doi.org/10.1093/bioinformatics/bty495
work_keys_str_mv	AT collinskodi pastaforproteins AT warnowtandy pastaforproteins

PASTA for proteins

Ejemplares similares