Cargando…

The research on gene-disease association based on text-mining of PubMed

BACKGROUND: The associations between genes and diseases are of critical significance in aspects of prevention, diagnosis and treatment. Although gene-disease relationships have been investigated extensively, much of the underpinnings of these associations are yet to be elucidated. METHODS: A novel m...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Jie, Fu, Bo-quan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5804013/
https://www.ncbi.nlm.nih.gov/pubmed/29415654
http://dx.doi.org/10.1186/s12859-018-2048-y
_version_ 1783298751980371968
author Zhou, Jie
Fu, Bo-quan
author_facet Zhou, Jie
Fu, Bo-quan
author_sort Zhou, Jie
collection PubMed
description BACKGROUND: The associations between genes and diseases are of critical significance in aspects of prevention, diagnosis and treatment. Although gene-disease relationships have been investigated extensively, much of the underpinnings of these associations are yet to be elucidated. METHODS: A novel method integrates MeSH database, term weight (TW), and co-occurrence methods to predict gene-disease associations based on the cosine similarity between gene vectors and disease vectors. Vectors are transformed from the texts of documents in the PubMed database according to the appearance and location of the gene or disease terms. The disease related text data has been optimized during the process of constructing vectors. RESULTS: The overall distribution of cosine similarity value was investigated. By using the gene-disease association data in OMIM database as golden standard, the performance of cosine similarity in predicting gene-disease linkage was evaluated. The effects of applying weight matrix, penalty weights for keywords (PWK), and normalization were also investigated. Finally, we demonstrated that our method outperforms heterogeneous network edge prediction (HNEP) in aspects of precision rate and recall rate. CONCLUSIONS: Our method proposed in this paper is easy to be conducted and the results can be integrated with other models to improve the overall performance of gene-disease association predictions.
format Online
Article
Text
id pubmed-5804013
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-58040132018-02-14 The research on gene-disease association based on text-mining of PubMed Zhou, Jie Fu, Bo-quan BMC Bioinformatics Research Article BACKGROUND: The associations between genes and diseases are of critical significance in aspects of prevention, diagnosis and treatment. Although gene-disease relationships have been investigated extensively, much of the underpinnings of these associations are yet to be elucidated. METHODS: A novel method integrates MeSH database, term weight (TW), and co-occurrence methods to predict gene-disease associations based on the cosine similarity between gene vectors and disease vectors. Vectors are transformed from the texts of documents in the PubMed database according to the appearance and location of the gene or disease terms. The disease related text data has been optimized during the process of constructing vectors. RESULTS: The overall distribution of cosine similarity value was investigated. By using the gene-disease association data in OMIM database as golden standard, the performance of cosine similarity in predicting gene-disease linkage was evaluated. The effects of applying weight matrix, penalty weights for keywords (PWK), and normalization were also investigated. Finally, we demonstrated that our method outperforms heterogeneous network edge prediction (HNEP) in aspects of precision rate and recall rate. CONCLUSIONS: Our method proposed in this paper is easy to be conducted and the results can be integrated with other models to improve the overall performance of gene-disease association predictions. BioMed Central 2018-02-07 /pmc/articles/PMC5804013/ /pubmed/29415654 http://dx.doi.org/10.1186/s12859-018-2048-y Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Zhou, Jie
Fu, Bo-quan
The research on gene-disease association based on text-mining of PubMed
title The research on gene-disease association based on text-mining of PubMed
title_full The research on gene-disease association based on text-mining of PubMed
title_fullStr The research on gene-disease association based on text-mining of PubMed
title_full_unstemmed The research on gene-disease association based on text-mining of PubMed
title_short The research on gene-disease association based on text-mining of PubMed
title_sort research on gene-disease association based on text-mining of pubmed
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5804013/
https://www.ncbi.nlm.nih.gov/pubmed/29415654
http://dx.doi.org/10.1186/s12859-018-2048-y
work_keys_str_mv AT zhoujie theresearchongenediseaseassociationbasedontextminingofpubmed
AT fuboquan theresearchongenediseaseassociationbasedontextminingofpubmed
AT zhoujie researchongenediseaseassociationbasedontextminingofpubmed
AT fuboquan researchongenediseaseassociationbasedontextminingofpubmed