Cargando…

K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets

The recent growth in publicly available sequence data has introduced new opportunities for studying microbial evolution and spread. Because the pace of sequence accumulation tends to exceed the pace of experimental studies of protein function and the roles of individual amino acids, statistical tool...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pessia, Alberto, Grad, Yonatan, Cobey, Sarah, Puranen, Juha Santeri, Corander, Jukka
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Society for General Microbiology 2015
Materias:	Research Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5320600/ https://www.ncbi.nlm.nih.gov/pubmed/28348810 http://dx.doi.org/10.1099/mgen.0.000025

_version_	1782509569300234240
author	Pessia, Alberto Grad, Yonatan Cobey, Sarah Puranen, Juha Santeri Corander, Jukka
author_facet	Pessia, Alberto Grad, Yonatan Cobey, Sarah Puranen, Juha Santeri Corander, Jukka
author_sort	Pessia, Alberto
collection	PubMed
description	The recent growth in publicly available sequence data has introduced new opportunities for studying microbial evolution and spread. Because the pace of sequence accumulation tends to exceed the pace of experimental studies of protein function and the roles of individual amino acids, statistical tools to identify meaningful patterns in protein diversity are essential. Large sequence alignments from fast-evolving micro-organisms are particularly challenging to dissect using standard tools from phylogenetics and multivariate statistics because biologically relevant functional signals are easily masked by neutral variation and noise. To meet this need, a novel computational method is introduced that is easily executed in parallel using a cluster environment and can handle thousands of sequences with minimal subjective input from the user. The usefulness of this kind of machine learning is demonstrated by applying it to nearly 5000 haemagglutinin sequences of influenza A/H3N2.Antigenic and 3D structural mapping of the results show that the method can recover the major jumps in antigenic phenotype that occurred between 1968 and 2013 and identify specific amino acids associated with these changes. The method is expected to provide a useful tool to uncover patterns of protein evolution.
format	Online Article Text
id	pubmed-5320600
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Society for General Microbiology
record_format	MEDLINE/PubMed
spelling	pubmed-53206002017-03-27 K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets Pessia, Alberto Grad, Yonatan Cobey, Sarah Puranen, Juha Santeri Corander, Jukka Microb Genom Research Paper The recent growth in publicly available sequence data has introduced new opportunities for studying microbial evolution and spread. Because the pace of sequence accumulation tends to exceed the pace of experimental studies of protein function and the roles of individual amino acids, statistical tools to identify meaningful patterns in protein diversity are essential. Large sequence alignments from fast-evolving micro-organisms are particularly challenging to dissect using standard tools from phylogenetics and multivariate statistics because biologically relevant functional signals are easily masked by neutral variation and noise. To meet this need, a novel computational method is introduced that is easily executed in parallel using a cluster environment and can handle thousands of sequences with minimal subjective input from the user. The usefulness of this kind of machine learning is demonstrated by applying it to nearly 5000 haemagglutinin sequences of influenza A/H3N2.Antigenic and 3D structural mapping of the results show that the method can recover the major jumps in antigenic phenotype that occurred between 1968 and 2013 and identify specific amino acids associated with these changes. The method is expected to provide a useful tool to uncover patterns of protein evolution. Society for General Microbiology 2015-07-15 /pmc/articles/PMC5320600/ /pubmed/28348810 http://dx.doi.org/10.1099/mgen.0.000025 Text en © 2015 The Authors http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/).
spellingShingle	Research Paper Pessia, Alberto Grad, Yonatan Cobey, Sarah Puranen, Juha Santeri Corander, Jukka K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets
title	K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets
title_full	K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets
title_fullStr	K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets
title_full_unstemmed	K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets
title_short	K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets
title_sort	k-pax2: bayesian identification of cluster-defining amino acid positions in large sequence datasets
topic	Research Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5320600/ https://www.ncbi.nlm.nih.gov/pubmed/28348810 http://dx.doi.org/10.1099/mgen.0.000025
work_keys_str_mv	AT pessiaalberto kpax2bayesianidentificationofclusterdefiningaminoacidpositionsinlargesequencedatasets AT gradyonatan kpax2bayesianidentificationofclusterdefiningaminoacidpositionsinlargesequencedatasets AT cobeysarah kpax2bayesianidentificationofclusterdefiningaminoacidpositionsinlargesequencedatasets AT puranenjuhasanteri kpax2bayesianidentificationofclusterdefiningaminoacidpositionsinlargesequencedatasets AT coranderjukka kpax2bayesianidentificationofclusterdefiningaminoacidpositionsinlargesequencedatasets

K-Pax2: Bayesian identification of cluster-defining amino acid positions in large sequence datasets

Ejemplares similares