Cargando…

Core column prediction for protein multiple sequence alignments

BACKGROUND: In a computed protein multiple sequence alignment, the coreness of a column is the fraction of its substitutions that are in so-called core columns of the gold-standard reference alignment of its proteins. In benchmark suites of protein reference alignments, the core columns of the refer...

Descripción completa

Detalles Bibliográficos
Autores principales:	DeBlasio, Dan, Kececioglu, John
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5397798/ https://www.ncbi.nlm.nih.gov/pubmed/28435440 http://dx.doi.org/10.1186/s13015-017-0102-3

_version_	1783230338768568320
author	DeBlasio, Dan Kececioglu, John
author_facet	DeBlasio, Dan Kececioglu, John
author_sort	DeBlasio, Dan
collection	PubMed
description	BACKGROUND: In a computed protein multiple sequence alignment, the coreness of a column is the fraction of its substitutions that are in so-called core columns of the gold-standard reference alignment of its proteins. In benchmark suites of protein reference alignments, the core columns of the reference alignment are those that can be confidently labeled as correct, usually due to all residues in the column being sufficiently close in the spatial superposition of the known three-dimensional structures of the proteins. Typically the accuracy of a protein multiple sequence alignment that has been computed for a benchmark is only measured with respect to the core columns of the reference alignment. When computing an alignment in practice, however, a reference alignment is not known, so the coreness of its columns can only be predicted. RESULTS: We develop for the first time a predictor of column coreness for protein multiple sequence alignments. This allows us to predict which columns of a computed alignment are core, and hence better estimate the alignment’s accuracy. Our approach to predicting coreness is similar to nearest-neighbor classification from machine learning, except we transform nearest-neighbor distances into a coreness prediction via a regression function, and we learn an appropriate distance function through a new optimization formulation that solves a large-scale linear programming problem. We apply our coreness predictor to parameter advising, the task of choosing parameter values for an aligner’s scoring function to obtain a more accurate alignment of a specific set of sequences. We show that for this task, our predictor strongly outperforms other column-confidence estimators from the literature, and affords a substantial boost in alignment accuracy.
format	Online Article Text
id	pubmed-5397798
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-53977982017-04-21 Core column prediction for protein multiple sequence alignments DeBlasio, Dan Kececioglu, John Algorithms Mol Biol Research BACKGROUND: In a computed protein multiple sequence alignment, the coreness of a column is the fraction of its substitutions that are in so-called core columns of the gold-standard reference alignment of its proteins. In benchmark suites of protein reference alignments, the core columns of the reference alignment are those that can be confidently labeled as correct, usually due to all residues in the column being sufficiently close in the spatial superposition of the known three-dimensional structures of the proteins. Typically the accuracy of a protein multiple sequence alignment that has been computed for a benchmark is only measured with respect to the core columns of the reference alignment. When computing an alignment in practice, however, a reference alignment is not known, so the coreness of its columns can only be predicted. RESULTS: We develop for the first time a predictor of column coreness for protein multiple sequence alignments. This allows us to predict which columns of a computed alignment are core, and hence better estimate the alignment’s accuracy. Our approach to predicting coreness is similar to nearest-neighbor classification from machine learning, except we transform nearest-neighbor distances into a coreness prediction via a regression function, and we learn an appropriate distance function through a new optimization formulation that solves a large-scale linear programming problem. We apply our coreness predictor to parameter advising, the task of choosing parameter values for an aligner’s scoring function to obtain a more accurate alignment of a specific set of sequences. We show that for this task, our predictor strongly outperforms other column-confidence estimators from the literature, and affords a substantial boost in alignment accuracy. BioMed Central 2017-04-19 /pmc/articles/PMC5397798/ /pubmed/28435440 http://dx.doi.org/10.1186/s13015-017-0102-3 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research DeBlasio, Dan Kececioglu, John Core column prediction for protein multiple sequence alignments
title	Core column prediction for protein multiple sequence alignments
title_full	Core column prediction for protein multiple sequence alignments
title_fullStr	Core column prediction for protein multiple sequence alignments
title_full_unstemmed	Core column prediction for protein multiple sequence alignments
title_short	Core column prediction for protein multiple sequence alignments
title_sort	core column prediction for protein multiple sequence alignments
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5397798/ https://www.ncbi.nlm.nih.gov/pubmed/28435440 http://dx.doi.org/10.1186/s13015-017-0102-3
work_keys_str_mv	AT deblasiodan corecolumnpredictionforproteinmultiplesequencealignments AT kececioglujohn corecolumnpredictionforproteinmultiplesequencealignments

Core column prediction for protein multiple sequence alignments

Ejemplares similares