Cargando…
Mimvec: a deep learning approach for analyzing the human phenome
BACKGROUND: The human phenome has been widely used with a variety of genomic data sources in the inference of disease genes. However, most existing methods thus far derive phenotype similarity based on the analysis of biomedical databases by using the traditional term frequency-inverse document freq...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5615244/ https://www.ncbi.nlm.nih.gov/pubmed/28950906 http://dx.doi.org/10.1186/s12918-017-0451-z |
_version_ | 1783266546829754368 |
---|---|
author | Gan, Mingxin Li, Wenran Zeng, Wanwen Wang, Xiaojian Jiang, Rui |
author_facet | Gan, Mingxin Li, Wenran Zeng, Wanwen Wang, Xiaojian Jiang, Rui |
author_sort | Gan, Mingxin |
collection | PubMed |
description | BACKGROUND: The human phenome has been widely used with a variety of genomic data sources in the inference of disease genes. However, most existing methods thus far derive phenotype similarity based on the analysis of biomedical databases by using the traditional term frequency-inverse document frequency (TF-IDF) formulation. This framework, though intuitive, not only ignores semantic relationships between words but also tends to produce high-dimensional vectors, and hence lacks the ability to precisely capture intrinsic semantic characteristics of biomedical documents. To overcome these limitations, we propose a framework called mimvec to analyze the human phenome by making use of the state-of-the-art deep learning technique in natural language processing. RESULTS: We converted 24,061 records in the Online Mendelian Inheritance in Man (OMIM) database to low-dimensional vectors using our method. We demonstrated that the vector presentation not only effectively enabled classification of phenotype records against gene ones, but also succeeded in discriminating diseases of different inheritance styles and different mechanisms. We further derived pairwise phenotype similarities between 7988 human inherited diseases using their vector presentations. With a joint analysis of this phenome with multiple genomic data, we showed that phenotype overlap indeed implied genotype overlap. We finally used the derived phenotype similarities with genomic data to prioritize candidate genes and demonstrated advantages of this method over existing ones. CONCLUSIONS: Our method is capable of not only capturing semantic relationships between words in biomedical records but also alleviating the dimensional disaster accompanying the traditional TF-IDF framework. With the approaching of precision medicine, there will be abundant electronic records of medicine and health awaiting for deep analysis, and we expect to see a wide spectrum of applications borrowing the idea of our method in the near future. |
format | Online Article Text |
id | pubmed-5615244 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-56152442017-09-28 Mimvec: a deep learning approach for analyzing the human phenome Gan, Mingxin Li, Wenran Zeng, Wanwen Wang, Xiaojian Jiang, Rui BMC Syst Biol Research BACKGROUND: The human phenome has been widely used with a variety of genomic data sources in the inference of disease genes. However, most existing methods thus far derive phenotype similarity based on the analysis of biomedical databases by using the traditional term frequency-inverse document frequency (TF-IDF) formulation. This framework, though intuitive, not only ignores semantic relationships between words but also tends to produce high-dimensional vectors, and hence lacks the ability to precisely capture intrinsic semantic characteristics of biomedical documents. To overcome these limitations, we propose a framework called mimvec to analyze the human phenome by making use of the state-of-the-art deep learning technique in natural language processing. RESULTS: We converted 24,061 records in the Online Mendelian Inheritance in Man (OMIM) database to low-dimensional vectors using our method. We demonstrated that the vector presentation not only effectively enabled classification of phenotype records against gene ones, but also succeeded in discriminating diseases of different inheritance styles and different mechanisms. We further derived pairwise phenotype similarities between 7988 human inherited diseases using their vector presentations. With a joint analysis of this phenome with multiple genomic data, we showed that phenotype overlap indeed implied genotype overlap. We finally used the derived phenotype similarities with genomic data to prioritize candidate genes and demonstrated advantages of this method over existing ones. CONCLUSIONS: Our method is capable of not only capturing semantic relationships between words in biomedical records but also alleviating the dimensional disaster accompanying the traditional TF-IDF framework. With the approaching of precision medicine, there will be abundant electronic records of medicine and health awaiting for deep analysis, and we expect to see a wide spectrum of applications borrowing the idea of our method in the near future. BioMed Central 2017-09-21 /pmc/articles/PMC5615244/ /pubmed/28950906 http://dx.doi.org/10.1186/s12918-017-0451-z Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Gan, Mingxin Li, Wenran Zeng, Wanwen Wang, Xiaojian Jiang, Rui Mimvec: a deep learning approach for analyzing the human phenome |
title | Mimvec: a deep learning approach for analyzing the human phenome |
title_full | Mimvec: a deep learning approach for analyzing the human phenome |
title_fullStr | Mimvec: a deep learning approach for analyzing the human phenome |
title_full_unstemmed | Mimvec: a deep learning approach for analyzing the human phenome |
title_short | Mimvec: a deep learning approach for analyzing the human phenome |
title_sort | mimvec: a deep learning approach for analyzing the human phenome |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5615244/ https://www.ncbi.nlm.nih.gov/pubmed/28950906 http://dx.doi.org/10.1186/s12918-017-0451-z |
work_keys_str_mv | AT ganmingxin mimvecadeeplearningapproachforanalyzingthehumanphenome AT liwenran mimvecadeeplearningapproachforanalyzingthehumanphenome AT zengwanwen mimvecadeeplearningapproachforanalyzingthehumanphenome AT wangxiaojian mimvecadeeplearningapproachforanalyzingthehumanphenome AT jiangrui mimvecadeeplearningapproachforanalyzingthehumanphenome |