Cargando…
Annotating gene sets by mining large literature collections with protein networks
Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networ...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5806628/ https://www.ncbi.nlm.nih.gov/pubmed/29218918 |
_version_ | 1783299160163745792 |
---|---|
author | Wang, Sheng Ma, Jianzhu Yu, Michael Ku Zheng, Fan Huang, Edward W Han, Jiawei Peng, Jian Ideker, Trey |
author_facet | Wang, Sheng Ma, Jianzhu Yu, Michael Ku Zheng, Fan Huang, Edward W Han, Jiawei Peng, Jian Ideker, Trey |
author_sort | Wang, Sheng |
collection | PubMed |
description | Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations. |
format | Online Article Text |
id | pubmed-5806628 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
record_format | MEDLINE/PubMed |
spelling | pubmed-58066282018-02-09 Annotating gene sets by mining large literature collections with protein networks Wang, Sheng Ma, Jianzhu Yu, Michael Ku Zheng, Fan Huang, Edward W Han, Jiawei Peng, Jian Ideker, Trey Pac Symp Biocomput Article Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations. 2018 /pmc/articles/PMC5806628/ /pubmed/29218918 Text en http://creativecommons.org/licenses/by-nc/4.0/ Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License. |
spellingShingle | Article Wang, Sheng Ma, Jianzhu Yu, Michael Ku Zheng, Fan Huang, Edward W Han, Jiawei Peng, Jian Ideker, Trey Annotating gene sets by mining large literature collections with protein networks |
title | Annotating gene sets by mining large literature collections with protein networks |
title_full | Annotating gene sets by mining large literature collections with protein networks |
title_fullStr | Annotating gene sets by mining large literature collections with protein networks |
title_full_unstemmed | Annotating gene sets by mining large literature collections with protein networks |
title_short | Annotating gene sets by mining large literature collections with protein networks |
title_sort | annotating gene sets by mining large literature collections with protein networks |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5806628/ https://www.ncbi.nlm.nih.gov/pubmed/29218918 |
work_keys_str_mv | AT wangsheng annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks AT majianzhu annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks AT yumichaelku annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks AT zhengfan annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks AT huangedwardw annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks AT hanjiawei annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks AT pengjian annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks AT idekertrey annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks |