Cargando…

Kpax3: Bayesian bi-clustering of large sequence datasets

MOTIVATION: Estimation of the hidden population structure is an important step in many genetic studies. Often the aim is also to identify which sequence locations are the most discriminative between groups of samples for a given data partition. Automated discovery of interesting patterns that are pr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pessia, Alberto, Corander, Jukka
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2018
Materias:	Applications Notes
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881668/ https://www.ncbi.nlm.nih.gov/pubmed/29425273 http://dx.doi.org/10.1093/bioinformatics/bty056

_version_	1784879159490641920
author	Pessia, Alberto Corander, Jukka
author_facet	Pessia, Alberto Corander, Jukka
author_sort	Pessia, Alberto
collection	PubMed
description	MOTIVATION: Estimation of the hidden population structure is an important step in many genetic studies. Often the aim is also to identify which sequence locations are the most discriminative between groups of samples for a given data partition. Automated discovery of interesting patterns that are present in the data can help to generate new biological hypotheses. RESULTS: We introduce Kpax3, a Bayesian method for bi-clustering multiple sequence alignments. Influence of individual sites will be determined in a supervised manner by using informative prior distributions for the model parameters. Our inference method uses an implementation of both split-merge and Gibbs sampler type MCMC algorithms to traverse the joint posterior of partitions of samples and variables. We use a large Rotavirus sequence dataset to demonstrate the ability of Kpax3 to generate biologically important hypotheses about differential selective pressures across a virus protein. AVAILABILITY AND IMPLEMENTATION: Kpax3 is implemented as a Julia package and released under the MIT license. Source code and documentation are available at: https://github.com/albertopessia/Kpax3.jl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-9881668
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-98816682023-01-31 Kpax3: Bayesian bi-clustering of large sequence datasets Pessia, Alberto Corander, Jukka Bioinformatics Applications Notes MOTIVATION: Estimation of the hidden population structure is an important step in many genetic studies. Often the aim is also to identify which sequence locations are the most discriminative between groups of samples for a given data partition. Automated discovery of interesting patterns that are present in the data can help to generate new biological hypotheses. RESULTS: We introduce Kpax3, a Bayesian method for bi-clustering multiple sequence alignments. Influence of individual sites will be determined in a supervised manner by using informative prior distributions for the model parameters. Our inference method uses an implementation of both split-merge and Gibbs sampler type MCMC algorithms to traverse the joint posterior of partitions of samples and variables. We use a large Rotavirus sequence dataset to demonstrate the ability of Kpax3 to generate biologically important hypotheses about differential selective pressures across a virus protein. AVAILABILITY AND IMPLEMENTATION: Kpax3 is implemented as a Julia package and released under the MIT license. Source code and documentation are available at: https://github.com/albertopessia/Kpax3.jl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-02-07 /pmc/articles/PMC9881668/ /pubmed/29425273 http://dx.doi.org/10.1093/bioinformatics/bty056 Text en © The Author(s) 2018. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Applications Notes Pessia, Alberto Corander, Jukka Kpax3: Bayesian bi-clustering of large sequence datasets
title	Kpax3: Bayesian bi-clustering of large sequence datasets
title_full	Kpax3: Bayesian bi-clustering of large sequence datasets
title_fullStr	Kpax3: Bayesian bi-clustering of large sequence datasets
title_full_unstemmed	Kpax3: Bayesian bi-clustering of large sequence datasets
title_short	Kpax3: Bayesian bi-clustering of large sequence datasets
title_sort	kpax3: bayesian bi-clustering of large sequence datasets
topic	Applications Notes
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881668/ https://www.ncbi.nlm.nih.gov/pubmed/29425273 http://dx.doi.org/10.1093/bioinformatics/bty056
work_keys_str_mv	AT pessiaalberto kpax3bayesianbiclusteringoflargesequencedatasets AT coranderjukka kpax3bayesianbiclusteringoflargesequencedatasets

Kpax3: Bayesian bi-clustering of large sequence datasets

Ejemplares similares