Cargando…
Identifying gene-disease associations using centrality on a literature mined gene-interaction network
Motivation: Understanding the role of genetics in diseases is one of the most important aims of the biological sciences. The completion of the Human Genome Project has led to a rapid increase in the number of publications in this area. However, the coverage of curated databases that provide informat...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2718658/ https://www.ncbi.nlm.nih.gov/pubmed/18586725 http://dx.doi.org/10.1093/bioinformatics/btn182 |
_version_ | 1782170010018381824 |
---|---|
author | Özgür, Arzucan Vu, Thuy Erkan, Güneş Radev, Dragomir R. |
author_facet | Özgür, Arzucan Vu, Thuy Erkan, Güneş Radev, Dragomir R. |
author_sort | Özgür, Arzucan |
collection | PubMed |
description | Motivation: Understanding the role of genetics in diseases is one of the most important aims of the biological sciences. The completion of the Human Genome Project has led to a rapid increase in the number of publications in this area. However, the coverage of curated databases that provide information manually extracted from the literature is limited. Another challenge is that determining disease-related genes requires laborious experiments. Therefore, predicting good candidate genes before experimental analysis will save time and effort. We introduce an automatic approach based on text mining and network analysis to predict gene-disease associations. We collected an initial set of known disease-related genes and built an interaction network by automatic literature mining based on dependency parsing and support vector machines. Our hypothesis is that the central genes in this disease-specific network are likely to be related to the disease. We used the degree, eigenvector, betweenness and closeness centrality metrics to rank the genes in the network. Results: The proposed approach can be used to extract known and to infer unknown gene-disease associations. We evaluated the approach for prostate cancer. Eigenvector and degree centrality achieved high accuracy. A total of 95% of the top 20 genes ranked by these methods are confirmed to be related to prostate cancer. On the other hand, betweenness and closeness centrality predicted more genes whose relation to the disease is currently unknown and are candidates for experimental study. Availability: A web-based system for browsing the disease-specific gene-interaction networks is available at: http://gin.ncibi.org Contact: radev@umich.edu |
format | Text |
id | pubmed-2718658 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-27186582009-07-31 Identifying gene-disease associations using centrality on a literature mined gene-interaction network Özgür, Arzucan Vu, Thuy Erkan, Güneş Radev, Dragomir R. Bioinformatics Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto Motivation: Understanding the role of genetics in diseases is one of the most important aims of the biological sciences. The completion of the Human Genome Project has led to a rapid increase in the number of publications in this area. However, the coverage of curated databases that provide information manually extracted from the literature is limited. Another challenge is that determining disease-related genes requires laborious experiments. Therefore, predicting good candidate genes before experimental analysis will save time and effort. We introduce an automatic approach based on text mining and network analysis to predict gene-disease associations. We collected an initial set of known disease-related genes and built an interaction network by automatic literature mining based on dependency parsing and support vector machines. Our hypothesis is that the central genes in this disease-specific network are likely to be related to the disease. We used the degree, eigenvector, betweenness and closeness centrality metrics to rank the genes in the network. Results: The proposed approach can be used to extract known and to infer unknown gene-disease associations. We evaluated the approach for prostate cancer. Eigenvector and degree centrality achieved high accuracy. A total of 95% of the top 20 genes ranked by these methods are confirmed to be related to prostate cancer. On the other hand, betweenness and closeness centrality predicted more genes whose relation to the disease is currently unknown and are candidates for experimental study. Availability: A web-based system for browsing the disease-specific gene-interaction networks is available at: http://gin.ncibi.org Contact: radev@umich.edu Oxford University Press 2008-07-01 /pmc/articles/PMC2718658/ /pubmed/18586725 http://dx.doi.org/10.1093/bioinformatics/btn182 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto Özgür, Arzucan Vu, Thuy Erkan, Güneş Radev, Dragomir R. Identifying gene-disease associations using centrality on a literature mined gene-interaction network |
title | Identifying gene-disease associations using centrality on a literature mined gene-interaction network |
title_full | Identifying gene-disease associations using centrality on a literature mined gene-interaction network |
title_fullStr | Identifying gene-disease associations using centrality on a literature mined gene-interaction network |
title_full_unstemmed | Identifying gene-disease associations using centrality on a literature mined gene-interaction network |
title_short | Identifying gene-disease associations using centrality on a literature mined gene-interaction network |
title_sort | identifying gene-disease associations using centrality on a literature mined gene-interaction network |
topic | Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2718658/ https://www.ncbi.nlm.nih.gov/pubmed/18586725 http://dx.doi.org/10.1093/bioinformatics/btn182 |
work_keys_str_mv | AT ozgurarzucan identifyinggenediseaseassociationsusingcentralityonaliteratureminedgeneinteractionnetwork AT vuthuy identifyinggenediseaseassociationsusingcentralityonaliteratureminedgeneinteractionnetwork AT erkangunes identifyinggenediseaseassociationsusingcentralityonaliteratureminedgeneinteractionnetwork AT radevdragomirr identifyinggenediseaseassociationsusingcentralityonaliteratureminedgeneinteractionnetwork |