Cargando…

Deep learning of mutation-gene-drug relations from the literature

BACKGROUND: Molecular biomarkers that can predict drug efficacy in cancer patients are crucial components for the advancement of precision medicine. However, identifying these molecular biomarkers remains a laborious and challenging task. Next-generation sequencing of patients and preclinical models...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Kyubum, Kim, Byounggun, Choi, Yonghwa, Kim, Sunkyu, Shin, Wonho, Lee, Sunwon, Park, Sungjoon, Kim, Seongsoon, Tan, Aik Choon, Kang, Jaewoo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5784504/
https://www.ncbi.nlm.nih.gov/pubmed/29368597
http://dx.doi.org/10.1186/s12859-018-2029-1
_version_ 1783295454063099904
author Lee, Kyubum
Kim, Byounggun
Choi, Yonghwa
Kim, Sunkyu
Shin, Wonho
Lee, Sunwon
Park, Sungjoon
Kim, Seongsoon
Tan, Aik Choon
Kang, Jaewoo
author_facet Lee, Kyubum
Kim, Byounggun
Choi, Yonghwa
Kim, Sunkyu
Shin, Wonho
Lee, Sunwon
Park, Sungjoon
Kim, Seongsoon
Tan, Aik Choon
Kang, Jaewoo
author_sort Lee, Kyubum
collection PubMed
description BACKGROUND: Molecular biomarkers that can predict drug efficacy in cancer patients are crucial components for the advancement of precision medicine. However, identifying these molecular biomarkers remains a laborious and challenging task. Next-generation sequencing of patients and preclinical models have increasingly led to the identification of novel gene-mutation-drug relations, and these results have been reported and published in the scientific literature. RESULTS: Here, we present two new computational methods that utilize all the PubMed articles as domain specific background knowledge to assist in the extraction and curation of gene-mutation-drug relations from the literature. The first method uses the Biomedical Entity Search Tool (BEST) scoring results as some of the features to train the machine learning classifiers. The second method uses not only the BEST scoring results, but also word vectors in a deep convolutional neural network model that are constructed from and trained on numerous documents such as PubMed abstracts and Google News articles. Using the features obtained from both the BEST search engine scores and word vectors, we extract mutation-gene and mutation-drug relations from the literature using machine learning classifiers such as random forest and deep convolutional neural networks. Our methods achieved better results compared with the state-of-the-art methods. We used our proposed features in a simple machine learning model, and obtained F1-scores of 0.96 and 0.82 for mutation-gene and mutation-drug relation classification, respectively. We also developed a deep learning classification model using convolutional neural networks, BEST scores, and the word embeddings that are pre-trained on PubMed or Google News data. Using deep learning, the classification accuracy improved, and F1-scores of 0.96 and 0.86 were obtained for the mutation-gene and mutation-drug relations, respectively. CONCLUSION: We believe that our computational methods described in this research could be used as an important tool in identifying molecular biomarkers that predict drug responses in cancer patients. We also built a database of these mutation-gene-drug relations that were extracted from all the PubMed abstracts. We believe that our database can prove to be a valuable resource for precision medicine researchers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2029-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5784504
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57845042018-02-07 Deep learning of mutation-gene-drug relations from the literature Lee, Kyubum Kim, Byounggun Choi, Yonghwa Kim, Sunkyu Shin, Wonho Lee, Sunwon Park, Sungjoon Kim, Seongsoon Tan, Aik Choon Kang, Jaewoo BMC Bioinformatics Research Article BACKGROUND: Molecular biomarkers that can predict drug efficacy in cancer patients are crucial components for the advancement of precision medicine. However, identifying these molecular biomarkers remains a laborious and challenging task. Next-generation sequencing of patients and preclinical models have increasingly led to the identification of novel gene-mutation-drug relations, and these results have been reported and published in the scientific literature. RESULTS: Here, we present two new computational methods that utilize all the PubMed articles as domain specific background knowledge to assist in the extraction and curation of gene-mutation-drug relations from the literature. The first method uses the Biomedical Entity Search Tool (BEST) scoring results as some of the features to train the machine learning classifiers. The second method uses not only the BEST scoring results, but also word vectors in a deep convolutional neural network model that are constructed from and trained on numerous documents such as PubMed abstracts and Google News articles. Using the features obtained from both the BEST search engine scores and word vectors, we extract mutation-gene and mutation-drug relations from the literature using machine learning classifiers such as random forest and deep convolutional neural networks. Our methods achieved better results compared with the state-of-the-art methods. We used our proposed features in a simple machine learning model, and obtained F1-scores of 0.96 and 0.82 for mutation-gene and mutation-drug relation classification, respectively. We also developed a deep learning classification model using convolutional neural networks, BEST scores, and the word embeddings that are pre-trained on PubMed or Google News data. Using deep learning, the classification accuracy improved, and F1-scores of 0.96 and 0.86 were obtained for the mutation-gene and mutation-drug relations, respectively. CONCLUSION: We believe that our computational methods described in this research could be used as an important tool in identifying molecular biomarkers that predict drug responses in cancer patients. We also built a database of these mutation-gene-drug relations that were extracted from all the PubMed abstracts. We believe that our database can prove to be a valuable resource for precision medicine researchers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2029-1) contains supplementary material, which is available to authorized users. BioMed Central 2018-01-25 /pmc/articles/PMC5784504/ /pubmed/29368597 http://dx.doi.org/10.1186/s12859-018-2029-1 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Lee, Kyubum
Kim, Byounggun
Choi, Yonghwa
Kim, Sunkyu
Shin, Wonho
Lee, Sunwon
Park, Sungjoon
Kim, Seongsoon
Tan, Aik Choon
Kang, Jaewoo
Deep learning of mutation-gene-drug relations from the literature
title Deep learning of mutation-gene-drug relations from the literature
title_full Deep learning of mutation-gene-drug relations from the literature
title_fullStr Deep learning of mutation-gene-drug relations from the literature
title_full_unstemmed Deep learning of mutation-gene-drug relations from the literature
title_short Deep learning of mutation-gene-drug relations from the literature
title_sort deep learning of mutation-gene-drug relations from the literature
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5784504/
https://www.ncbi.nlm.nih.gov/pubmed/29368597
http://dx.doi.org/10.1186/s12859-018-2029-1
work_keys_str_mv AT leekyubum deeplearningofmutationgenedrugrelationsfromtheliterature
AT kimbyounggun deeplearningofmutationgenedrugrelationsfromtheliterature
AT choiyonghwa deeplearningofmutationgenedrugrelationsfromtheliterature
AT kimsunkyu deeplearningofmutationgenedrugrelationsfromtheliterature
AT shinwonho deeplearningofmutationgenedrugrelationsfromtheliterature
AT leesunwon deeplearningofmutationgenedrugrelationsfromtheliterature
AT parksungjoon deeplearningofmutationgenedrugrelationsfromtheliterature
AT kimseongsoon deeplearningofmutationgenedrugrelationsfromtheliterature
AT tanaikchoon deeplearningofmutationgenedrugrelationsfromtheliterature
AT kangjaewoo deeplearningofmutationgenedrugrelationsfromtheliterature