Cargando…

Annotating gene sets by mining large literature collections with protein networks

Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networ...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Sheng, Ma, Jianzhu, Yu, Michael Ku, Zheng, Fan, Huang, Edward W, Han, Jiawei, Peng, Jian, Ideker, Trey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5806628/
https://www.ncbi.nlm.nih.gov/pubmed/29218918
_version_ 1783299160163745792
author Wang, Sheng
Ma, Jianzhu
Yu, Michael Ku
Zheng, Fan
Huang, Edward W
Han, Jiawei
Peng, Jian
Ideker, Trey
author_facet Wang, Sheng
Ma, Jianzhu
Yu, Michael Ku
Zheng, Fan
Huang, Edward W
Han, Jiawei
Peng, Jian
Ideker, Trey
author_sort Wang, Sheng
collection PubMed
description Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations.
format Online
Article
Text
id pubmed-5806628
institution National Center for Biotechnology Information
language English
publishDate 2018
record_format MEDLINE/PubMed
spelling pubmed-58066282018-02-09 Annotating gene sets by mining large literature collections with protein networks Wang, Sheng Ma, Jianzhu Yu, Michael Ku Zheng, Fan Huang, Edward W Han, Jiawei Peng, Jian Ideker, Trey Pac Symp Biocomput Article Analysis of patient genomes and transcriptomes routinely recognizes new gene sets associated with human disease. Here we present an integrative natural language processing system which infers common functions for a gene set through automatic mining of the scientific literature with biological networks. This system links genes with associated literature phrases and combines these links with protein interactions in a single heterogeneous network. Multiscale functional annotations are inferred based on network distances between phrases and genes and then visualized as an ontology of biological concepts. To evaluate this system, we predict functions for gene sets representing known pathways and find that our approach achieves substantial improvement over the conventional text-mining baseline method. Moreover, our system discovers novel annotations for gene sets or pathways without previously known functions. Two case studies demonstrate how the system is used in discovery of new cancer-related pathways with ontological annotations. 2018 /pmc/articles/PMC5806628/ /pubmed/29218918 Text en http://creativecommons.org/licenses/by-nc/4.0/ Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License.
spellingShingle Article
Wang, Sheng
Ma, Jianzhu
Yu, Michael Ku
Zheng, Fan
Huang, Edward W
Han, Jiawei
Peng, Jian
Ideker, Trey
Annotating gene sets by mining large literature collections with protein networks
title Annotating gene sets by mining large literature collections with protein networks
title_full Annotating gene sets by mining large literature collections with protein networks
title_fullStr Annotating gene sets by mining large literature collections with protein networks
title_full_unstemmed Annotating gene sets by mining large literature collections with protein networks
title_short Annotating gene sets by mining large literature collections with protein networks
title_sort annotating gene sets by mining large literature collections with protein networks
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5806628/
https://www.ncbi.nlm.nih.gov/pubmed/29218918
work_keys_str_mv AT wangsheng annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks
AT majianzhu annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks
AT yumichaelku annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks
AT zhengfan annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks
AT huangedwardw annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks
AT hanjiawei annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks
AT pengjian annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks
AT idekertrey annotatinggenesetsbymininglargeliteraturecollectionswithproteinnetworks