Cargando…

Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees

Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed ye...

Descripción completa

Detalles Bibliográficos
Autores principales: Yamada, Kazunori D., Tomii, Kentaro, Katoh, Kazutaka
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5079479/
https://www.ncbi.nlm.nih.gov/pubmed/27378296
http://dx.doi.org/10.1093/bioinformatics/btw412
_version_ 1782462559662637056
author Yamada, Kazunori D.
Tomii, Kentaro
Katoh, Kazutaka
author_facet Yamada, Kazunori D.
Tomii, Kentaro
Katoh, Kazutaka
author_sort Yamada, Kazunori D.
collection PubMed
description Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones. Results: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use. Availability and Implementation: http://mafft.cbrc.jp/alignment/software/ Contact: katoh@ifrec.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5079479
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-50794792016-10-26 Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees Yamada, Kazunori D. Tomii, Kentaro Katoh, Kazutaka Bioinformatics Original Papers Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones. Results: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use. Availability and Implementation: http://mafft.cbrc.jp/alignment/software/ Contact: katoh@ifrec.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-11-01 2016-07-04 /pmc/articles/PMC5079479/ /pubmed/27378296 http://dx.doi.org/10.1093/bioinformatics/btw412 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Yamada, Kazunori D.
Tomii, Kentaro
Katoh, Kazutaka
Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees
title Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees
title_full Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees
title_fullStr Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees
title_full_unstemmed Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees
title_short Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees
title_sort application of the mafft sequence alignment program to large data—reexamination of the usefulness of chained guide trees
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5079479/
https://www.ncbi.nlm.nih.gov/pubmed/27378296
http://dx.doi.org/10.1093/bioinformatics/btw412
work_keys_str_mv AT yamadakazunorid applicationofthemafftsequencealignmentprogramtolargedatareexaminationoftheusefulnessofchainedguidetrees
AT tomiikentaro applicationofthemafftsequencealignmentprogramtolargedatareexaminationoftheusefulnessofchainedguidetrees
AT katohkazutaka applicationofthemafftsequencealignmentprogramtolargedatareexaminationoftheusefulnessofchainedguidetrees