Cargando…

A pan-cancer somatic mutation embedding using autoencoders

BACKGROUND: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimen...

Descripción completa

Detalles Bibliográficos
Autores principales: Palazzo, Martin, Beauseroy, Pierre, Yankilevich, Patricio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907172/
https://www.ncbi.nlm.nih.gov/pubmed/31829157
http://dx.doi.org/10.1186/s12859-019-3298-z
_version_ 1783478495354028032
author Palazzo, Martin
Beauseroy, Pierre
Yankilevich, Patricio
author_facet Palazzo, Martin
Beauseroy, Pierre
Yankilevich, Patricio
author_sort Palazzo, Martin
collection PubMed
description BACKGROUND: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing. RESULTS: Here we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes. CONCLUSIONS: The learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape.
format Online
Article
Text
id pubmed-6907172
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69071722019-12-20 A pan-cancer somatic mutation embedding using autoencoders Palazzo, Martin Beauseroy, Pierre Yankilevich, Patricio BMC Bioinformatics Methodology Article BACKGROUND: Next generation sequencing instruments are providing new opportunities for comprehensive analyses of cancer genomes. The increasing availability of tumor data allows to research the complexity of cancer disease with machine learning methods. The large available repositories of high dimensional tumor samples characterised with germline and somatic mutation data requires advance computational modelling for data interpretation. In this work, we propose to analyze this complex data with neural network learning, a methodology that made impressive advances in image and natural language processing. RESULTS: Here we present a tumor mutation profile analysis pipeline based on an autoencoder model, which is used to discover better representations of lower dimensionality from large somatic mutation data of 40 different tumor types and subtypes. Kernel learning with hierarchical cluster analysis are used to assess the quality of the learned somatic mutation embedding, on which support vector machine models are used to accurately classify tumor subtypes. CONCLUSIONS: The learned latent space maps the original samples in a much lower dimension while keeping the biological signals from the original tumor samples. This pipeline and the resulting embedding allows an easier exploration of the heterogeneity within and across tumor types and to perform an accurate classification of tumor samples in the pan-cancer somatic mutation landscape. BioMed Central 2019-12-11 /pmc/articles/PMC6907172/ /pubmed/31829157 http://dx.doi.org/10.1186/s12859-019-3298-z Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Palazzo, Martin
Beauseroy, Pierre
Yankilevich, Patricio
A pan-cancer somatic mutation embedding using autoencoders
title A pan-cancer somatic mutation embedding using autoencoders
title_full A pan-cancer somatic mutation embedding using autoencoders
title_fullStr A pan-cancer somatic mutation embedding using autoencoders
title_full_unstemmed A pan-cancer somatic mutation embedding using autoencoders
title_short A pan-cancer somatic mutation embedding using autoencoders
title_sort pan-cancer somatic mutation embedding using autoencoders
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907172/
https://www.ncbi.nlm.nih.gov/pubmed/31829157
http://dx.doi.org/10.1186/s12859-019-3298-z
work_keys_str_mv AT palazzomartin apancancersomaticmutationembeddingusingautoencoders
AT beauseroypierre apancancersomaticmutationembeddingusingautoencoders
AT yankilevichpatricio apancancersomaticmutationembeddingusingautoencoders
AT palazzomartin pancancersomaticmutationembeddingusingautoencoders
AT beauseroypierre pancancersomaticmutationembeddingusingautoencoders
AT yankilevichpatricio pancancersomaticmutationembeddingusingautoencoders