Cargando…

Mut2Vec: distributed representation of cancerous mutations

BACKGROUND: Embedding techniques for converting high-dimensional sparse data into low-dimensional distributed representations have been gaining popularity in various fields of research. In deep learning models, embedding is commonly used and proven to be more effective than naive binary representati...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Sunkyu, Lee, Heewon, Kim, Keonwoo, Kang, Jaewoo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5918431/ https://www.ncbi.nlm.nih.gov/pubmed/29697361 http://dx.doi.org/10.1186/s12920-018-0349-7

_version_	1783317414567477248
author	Kim, Sunkyu Lee, Heewon Kim, Keonwoo Kang, Jaewoo
author_facet	Kim, Sunkyu Lee, Heewon Kim, Keonwoo Kang, Jaewoo
author_sort	Kim, Sunkyu
collection	PubMed
description	BACKGROUND: Embedding techniques for converting high-dimensional sparse data into low-dimensional distributed representations have been gaining popularity in various fields of research. In deep learning models, embedding is commonly used and proven to be more effective than naive binary representation. However, yet no attempt has been made to embed highly sparse mutation profiles into densely distributed representations. Since binary representation does not capture biological context, its use is limited in many applications such as discovering novel driver mutations. Additionally, training distributed representations of mutations is challenging due to a relatively small amount of available biological data compared with the large amount of text corpus data in text mining fields. METHODS: We introduce Mut2Vec, a novel computational pipeline that can be used to create a distributed representation of cancerous mutations. Mut2Vec is trained on cancer profiles using Skip-Gram since cancer can be characterized by a series of co-occurring mutations. We also augmented our pipeline with existing information in the biomedical literature and protein-protein interaction networks to compensate for the data insufficiency. RESULTS: To evaluate our models, we conducted two experiments that involved the following tasks: a) visualizing driver and passenger mutations, b) identifying novel driver mutations using a clustering method. Our visualization showed a clear distinction between passenger mutations and driver mutations. We also found driver mutation candidates and proved that these were true driver mutations based on our literature survey. The pre-trained mutation vectors and the candidate driver mutations are publicly available at http://infos.korea.ac.kr/mut2vec. CONCLUSIONS: We introduce Mut2Vec that can be utilized to generate distributed representations of mutations and experimentally validate the efficacy of the generated mutation representations. Mut2Vec can be used in various deep learning applications such as cancer classification and drug sensitivity prediction. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0349-7) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5918431
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-59184312018-04-30 Mut2Vec: distributed representation of cancerous mutations Kim, Sunkyu Lee, Heewon Kim, Keonwoo Kang, Jaewoo BMC Med Genomics Research BACKGROUND: Embedding techniques for converting high-dimensional sparse data into low-dimensional distributed representations have been gaining popularity in various fields of research. In deep learning models, embedding is commonly used and proven to be more effective than naive binary representation. However, yet no attempt has been made to embed highly sparse mutation profiles into densely distributed representations. Since binary representation does not capture biological context, its use is limited in many applications such as discovering novel driver mutations. Additionally, training distributed representations of mutations is challenging due to a relatively small amount of available biological data compared with the large amount of text corpus data in text mining fields. METHODS: We introduce Mut2Vec, a novel computational pipeline that can be used to create a distributed representation of cancerous mutations. Mut2Vec is trained on cancer profiles using Skip-Gram since cancer can be characterized by a series of co-occurring mutations. We also augmented our pipeline with existing information in the biomedical literature and protein-protein interaction networks to compensate for the data insufficiency. RESULTS: To evaluate our models, we conducted two experiments that involved the following tasks: a) visualizing driver and passenger mutations, b) identifying novel driver mutations using a clustering method. Our visualization showed a clear distinction between passenger mutations and driver mutations. We also found driver mutation candidates and proved that these were true driver mutations based on our literature survey. The pre-trained mutation vectors and the candidate driver mutations are publicly available at http://infos.korea.ac.kr/mut2vec. CONCLUSIONS: We introduce Mut2Vec that can be utilized to generate distributed representations of mutations and experimentally validate the efficacy of the generated mutation representations. Mut2Vec can be used in various deep learning applications such as cancer classification and drug sensitivity prediction. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0349-7) contains supplementary material, which is available to authorized users. BioMed Central 2018-04-20 /pmc/articles/PMC5918431/ /pubmed/29697361 http://dx.doi.org/10.1186/s12920-018-0349-7 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Kim, Sunkyu Lee, Heewon Kim, Keonwoo Kang, Jaewoo Mut2Vec: distributed representation of cancerous mutations
title	Mut2Vec: distributed representation of cancerous mutations
title_full	Mut2Vec: distributed representation of cancerous mutations
title_fullStr	Mut2Vec: distributed representation of cancerous mutations
title_full_unstemmed	Mut2Vec: distributed representation of cancerous mutations
title_short	Mut2Vec: distributed representation of cancerous mutations
title_sort	mut2vec: distributed representation of cancerous mutations
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5918431/ https://www.ncbi.nlm.nih.gov/pubmed/29697361 http://dx.doi.org/10.1186/s12920-018-0349-7
work_keys_str_mv	AT kimsunkyu mut2vecdistributedrepresentationofcancerousmutations AT leeheewon mut2vecdistributedrepresentationofcancerousmutations AT kimkeonwoo mut2vecdistributedrepresentationofcancerousmutations AT kangjaewoo mut2vecdistributedrepresentationofcancerousmutations

Mut2Vec: distributed representation of cancerous mutations

Ejemplares similares