Cargando…

Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution

Models of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be e...

Descripción completa

Detalles Bibliográficos
Autores principales:	Murrell, Ben, Weighill, Thomas, Buys, Jan, Ketteringham, Robert, Moola, Sasha, Benade, Gerdus, du Buisson, Lise, Kaliski, Daniel, Hands, Tristan, Scheffler, Konrad
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245233/ https://www.ncbi.nlm.nih.gov/pubmed/22216138 http://dx.doi.org/10.1371/journal.pone.0028898

_version_	1782219826019696640
author	Murrell, Ben Weighill, Thomas Buys, Jan Ketteringham, Robert Moola, Sasha Benade, Gerdus du Buisson, Lise Kaliski, Daniel Hands, Tristan Scheffler, Konrad
author_facet	Murrell, Ben Weighill, Thomas Buys, Jan Ketteringham, Robert Moola, Sasha Benade, Gerdus du Buisson, Lise Kaliski, Daniel Hands, Tristan Scheffler, Konrad
author_sort	Murrell, Ben
collection	PubMed
description	Models of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be estimated when a large quantity of data are available for a single organism or gene, and are intended for use on that organism or gene only. Unsurprisingly, specialist models outperform generalist models, but in most instances there simply are not enough data available to estimate them. We propose a method for estimating alignment-specific models of protein evolution in which the complexity of the model is adapted to suit the richness of the data. Our method uses non-negative matrix factorization (NNMF) to learn a set of basis matrices from a general dataset containing a large number of alignments of different proteins, thus capturing the dimensions of important variation. It then learns a set of weights that are specific to the organism or gene of interest and for which only a smaller dataset is available. Thus the alignment-specific model is obtained as a weighted sum of the basis matrices. Having been constrained to vary along only as many dimensions as the data justify, the model has far fewer parameters than would be required to estimate a specialist model. We show that our NNMF procedure produces models that outperform existing methods on all but one of 50 test alignments. The basis matrices we obtain confirm the expectation that amino acid properties tend to be conserved, and allow us to quantify, on specific alignments, how the strength of conservation varies across different properties. We also apply our new models to phylogeny inference and show that the resulting phylogenies are different from, and have improved likelihood over, those inferred under standard models.
format	Online Article Text
id	pubmed-3245233
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-32452332012-01-03 Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution Murrell, Ben Weighill, Thomas Buys, Jan Ketteringham, Robert Moola, Sasha Benade, Gerdus du Buisson, Lise Kaliski, Daniel Hands, Tristan Scheffler, Konrad PLoS One Research Article Models of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be estimated when a large quantity of data are available for a single organism or gene, and are intended for use on that organism or gene only. Unsurprisingly, specialist models outperform generalist models, but in most instances there simply are not enough data available to estimate them. We propose a method for estimating alignment-specific models of protein evolution in which the complexity of the model is adapted to suit the richness of the data. Our method uses non-negative matrix factorization (NNMF) to learn a set of basis matrices from a general dataset containing a large number of alignments of different proteins, thus capturing the dimensions of important variation. It then learns a set of weights that are specific to the organism or gene of interest and for which only a smaller dataset is available. Thus the alignment-specific model is obtained as a weighted sum of the basis matrices. Having been constrained to vary along only as many dimensions as the data justify, the model has far fewer parameters than would be required to estimate a specialist model. We show that our NNMF procedure produces models that outperform existing methods on all but one of 50 test alignments. The basis matrices we obtain confirm the expectation that amino acid properties tend to be conserved, and allow us to quantify, on specific alignments, how the strength of conservation varies across different properties. We also apply our new models to phylogeny inference and show that the resulting phylogenies are different from, and have improved likelihood over, those inferred under standard models. Public Library of Science 2011-12-22 /pmc/articles/PMC3245233/ /pubmed/22216138 http://dx.doi.org/10.1371/journal.pone.0028898 Text en Murrell et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Murrell, Ben Weighill, Thomas Buys, Jan Ketteringham, Robert Moola, Sasha Benade, Gerdus du Buisson, Lise Kaliski, Daniel Hands, Tristan Scheffler, Konrad Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution
title	Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution
title_full	Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution
title_fullStr	Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution
title_full_unstemmed	Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution
title_short	Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution
title_sort	non-negative matrix factorization for learning alignment-specific models of protein evolution
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245233/ https://www.ncbi.nlm.nih.gov/pubmed/22216138 http://dx.doi.org/10.1371/journal.pone.0028898
work_keys_str_mv	AT murrellben nonnegativematrixfactorizationforlearningalignmentspecificmodelsofproteinevolution AT weighillthomas nonnegativematrixfactorizationforlearningalignmentspecificmodelsofproteinevolution AT buysjan nonnegativematrixfactorizationforlearningalignmentspecificmodelsofproteinevolution AT ketteringhamrobert nonnegativematrixfactorizationforlearningalignmentspecificmodelsofproteinevolution AT moolasasha nonnegativematrixfactorizationforlearningalignmentspecificmodelsofproteinevolution AT benadegerdus nonnegativematrixfactorizationforlearningalignmentspecificmodelsofproteinevolution AT dubuissonlise nonnegativematrixfactorizationforlearningalignmentspecificmodelsofproteinevolution AT kaliskidaniel nonnegativematrixfactorizationforlearningalignmentspecificmodelsofproteinevolution AT handstristan nonnegativematrixfactorizationforlearningalignmentspecificmodelsofproteinevolution AT schefflerkonrad nonnegativematrixfactorizationforlearningalignmentspecificmodelsofproteinevolution

Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution

Ejemplares similares