Cargando…

Discovering novel mutation signatures by latent Dirichlet allocation with variational Bayes inference

MOTIVATION: A cancer genome includes many mutations derived from various mutagens and mutational processes, leading to specific mutation patterns. It is known that each mutational process leads to characteristic mutations, and when a mutational process has preferences for mutations, this situation i...

Descripción completa

Detalles Bibliográficos
Autores principales: Matsutani, Taro, Ueno, Yuki, Fukunaga, Tsukasa, Hamada, Michiaki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853711/
https://www.ncbi.nlm.nih.gov/pubmed/30993319
http://dx.doi.org/10.1093/bioinformatics/btz266
_version_ 1783470083082813440
author Matsutani, Taro
Ueno, Yuki
Fukunaga, Tsukasa
Hamada, Michiaki
author_facet Matsutani, Taro
Ueno, Yuki
Fukunaga, Tsukasa
Hamada, Michiaki
author_sort Matsutani, Taro
collection PubMed
description MOTIVATION: A cancer genome includes many mutations derived from various mutagens and mutational processes, leading to specific mutation patterns. It is known that each mutational process leads to characteristic mutations, and when a mutational process has preferences for mutations, this situation is called a ‘mutation signature.’ Identification of mutation signatures is an important task for elucidation of carcinogenic mechanisms. In previous studies, analyses with statistical approaches (e.g. non-negative matrix factorization and latent Dirichlet allocation) revealed a number of mutation signatures. Nonetheless, strictly speaking, these existing approaches employ an ad hoc method or incorrect approximation to estimate the number of mutation signatures, and the whole picture of mutation signatures is unclear. RESULTS: In this study, we present a novel method for estimating the number of mutation signatures—latent Dirichlet allocation with variational Bayes inference (VB-LDA)—where variational lower bounds are utilized for finding a plausible number of mutation patterns. In addition, we performed cluster analyses for estimated mutation signatures to extract novel mutation signatures that appear in multiple primary lesions. In a simulation with artificial data, we confirmed that our method estimated the correct number of mutation signatures. Furthermore, applying our method in combination with clustering procedures for real mutation data revealed many interesting mutation signatures that have not been previously reported. AVAILABILITY AND IMPLEMENTATION: All the predicted mutation signatures with clustering results are freely available at http://www.f.waseda.jp/mhamada/MS/index.html. All the C++ source code and python scripts utilized in this study can be downloaded on the Internet (https://github.com/qkirikigaku/MS_LDA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6853711
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-68537112019-11-19 Discovering novel mutation signatures by latent Dirichlet allocation with variational Bayes inference Matsutani, Taro Ueno, Yuki Fukunaga, Tsukasa Hamada, Michiaki Bioinformatics Original Papers MOTIVATION: A cancer genome includes many mutations derived from various mutagens and mutational processes, leading to specific mutation patterns. It is known that each mutational process leads to characteristic mutations, and when a mutational process has preferences for mutations, this situation is called a ‘mutation signature.’ Identification of mutation signatures is an important task for elucidation of carcinogenic mechanisms. In previous studies, analyses with statistical approaches (e.g. non-negative matrix factorization and latent Dirichlet allocation) revealed a number of mutation signatures. Nonetheless, strictly speaking, these existing approaches employ an ad hoc method or incorrect approximation to estimate the number of mutation signatures, and the whole picture of mutation signatures is unclear. RESULTS: In this study, we present a novel method for estimating the number of mutation signatures—latent Dirichlet allocation with variational Bayes inference (VB-LDA)—where variational lower bounds are utilized for finding a plausible number of mutation patterns. In addition, we performed cluster analyses for estimated mutation signatures to extract novel mutation signatures that appear in multiple primary lesions. In a simulation with artificial data, we confirmed that our method estimated the correct number of mutation signatures. Furthermore, applying our method in combination with clustering procedures for real mutation data revealed many interesting mutation signatures that have not been previously reported. AVAILABILITY AND IMPLEMENTATION: All the predicted mutation signatures with clustering results are freely available at http://www.f.waseda.jp/mhamada/MS/index.html. All the C++ source code and python scripts utilized in this study can be downloaded on the Internet (https://github.com/qkirikigaku/MS_LDA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-11-15 2019-04-16 /pmc/articles/PMC6853711/ /pubmed/30993319 http://dx.doi.org/10.1093/bioinformatics/btz266 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Matsutani, Taro
Ueno, Yuki
Fukunaga, Tsukasa
Hamada, Michiaki
Discovering novel mutation signatures by latent Dirichlet allocation with variational Bayes inference
title Discovering novel mutation signatures by latent Dirichlet allocation with variational Bayes inference
title_full Discovering novel mutation signatures by latent Dirichlet allocation with variational Bayes inference
title_fullStr Discovering novel mutation signatures by latent Dirichlet allocation with variational Bayes inference
title_full_unstemmed Discovering novel mutation signatures by latent Dirichlet allocation with variational Bayes inference
title_short Discovering novel mutation signatures by latent Dirichlet allocation with variational Bayes inference
title_sort discovering novel mutation signatures by latent dirichlet allocation with variational bayes inference
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853711/
https://www.ncbi.nlm.nih.gov/pubmed/30993319
http://dx.doi.org/10.1093/bioinformatics/btz266
work_keys_str_mv AT matsutanitaro discoveringnovelmutationsignaturesbylatentdirichletallocationwithvariationalbayesinference
AT uenoyuki discoveringnovelmutationsignaturesbylatentdirichletallocationwithvariationalbayesinference
AT fukunagatsukasa discoveringnovelmutationsignaturesbylatentdirichletallocationwithvariationalbayesinference
AT hamadamichiaki discoveringnovelmutationsignaturesbylatentdirichletallocationwithvariationalbayesinference