Cargando…

Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments

Motivation: Multiple sequence alignments (MSAs) with large numbers of sequences are now commonplace. However, current multiple alignment benchmarks are ill-suited for testing these types of alignments, as test cases either contain a very small number of sequences or are based purely on simulation ra...

Descripción completa

Detalles Bibliográficos
Autores principales: Fox, Gearóid, Sievers, Fabian, Higgins, Desmond G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5939968/
https://www.ncbi.nlm.nih.gov/pubmed/26568625
http://dx.doi.org/10.1093/bioinformatics/btv592
_version_ 1783321028777213952
author Fox, Gearóid
Sievers, Fabian
Higgins, Desmond G.
author_facet Fox, Gearóid
Sievers, Fabian
Higgins, Desmond G.
author_sort Fox, Gearóid
collection PubMed
description Motivation: Multiple sequence alignments (MSAs) with large numbers of sequences are now commonplace. However, current multiple alignment benchmarks are ill-suited for testing these types of alignments, as test cases either contain a very small number of sequences or are based purely on simulation rather than empirical data. Results: We take advantage of recent developments in protein structure prediction methods to create a benchmark (ContTest) for protein MSAs containing many thousands of sequences in each test case and which is based on empirical biological data. We rank popular MSA methods using this benchmark and verify a recent result showing that chained guide trees increase the accuracy of progressive alignment packages on datasets with thousands of proteins. Availability and implementation: Benchmark data and scripts are available for download at http://www.bioinf.ucd.ie/download/ContTest.tar.gz. Contact: des.higgins@ucd.ie Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5939968
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-59399682018-08-07 Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments Fox, Gearóid Sievers, Fabian Higgins, Desmond G. Bioinformatics Original Papers Motivation: Multiple sequence alignments (MSAs) with large numbers of sequences are now commonplace. However, current multiple alignment benchmarks are ill-suited for testing these types of alignments, as test cases either contain a very small number of sequences or are based purely on simulation rather than empirical data. Results: We take advantage of recent developments in protein structure prediction methods to create a benchmark (ContTest) for protein MSAs containing many thousands of sequences in each test case and which is based on empirical biological data. We rank popular MSA methods using this benchmark and verify a recent result showing that chained guide trees increase the accuracy of progressive alignment packages on datasets with thousands of proteins. Availability and implementation: Benchmark data and scripts are available for download at http://www.bioinf.ucd.ie/download/ContTest.tar.gz. Contact: des.higgins@ucd.ie Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-03-15 2015-11-14 /pmc/articles/PMC5939968/ /pubmed/26568625 http://dx.doi.org/10.1093/bioinformatics/btv592 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Fox, Gearóid
Sievers, Fabian
Higgins, Desmond G.
Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments
title Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments
title_full Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments
title_fullStr Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments
title_full_unstemmed Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments
title_short Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments
title_sort using de novo protein structure predictions to measure the quality of very large multiple sequence alignments
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5939968/
https://www.ncbi.nlm.nih.gov/pubmed/26568625
http://dx.doi.org/10.1093/bioinformatics/btv592
work_keys_str_mv AT foxgearoid usingdenovoproteinstructurepredictionstomeasurethequalityofverylargemultiplesequencealignments
AT sieversfabian usingdenovoproteinstructurepredictionstomeasurethequalityofverylargemultiplesequencealignments
AT higginsdesmondg usingdenovoproteinstructurepredictionstomeasurethequalityofverylargemultiplesequencealignments