Cargando…

Can the vector space model be used to identify biological entity activities?

BACKGROUND: Biological systems are commonly described as networks of entity interactions. Some interactions are already known and integrate the current knowledge in life sciences. Others remain unknown for long periods of time and are frequently discovered by chance. In this work we present a model...

Descripción completa

Detalles Bibliográficos
Autores principales: Maciel, Wesley D, Faria-Campos, Alessandra C, Gonçalves, Marcos A, Campos, Sérgio VA
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287578/
https://www.ncbi.nlm.nih.gov/pubmed/22369514
http://dx.doi.org/10.1186/1471-2164-12-S4-S1
_version_ 1782224695409508352
author Maciel, Wesley D
Faria-Campos, Alessandra C
Gonçalves, Marcos A
Campos, Sérgio VA
author_facet Maciel, Wesley D
Faria-Campos, Alessandra C
Gonçalves, Marcos A
Campos, Sérgio VA
author_sort Maciel, Wesley D
collection PubMed
description BACKGROUND: Biological systems are commonly described as networks of entity interactions. Some interactions are already known and integrate the current knowledge in life sciences. Others remain unknown for long periods of time and are frequently discovered by chance. In this work we present a model to predict these unknown interactions from a textual collection using the vector space model (VSM), a well known and established information retrieval model. We have extended the VSM ability to retrieve information using a transitive closure approach. Our objective is to use the VSM to identify the known interactions from the literature and construct a network. Based on interactions established in the network our model applies the transitive closure in order to predict and rank new interactions. RESULTS: We have tested and validated our model using a collection of patent claims issued from 1976 to 2005. From 266,528 possible interactions in our network, the model identified 1,027 known interactions and predicted 3,195 new interactions. Iterating the model according to patent issue dates, interactions found in a given past year were often confirmed by patent claims not in the collection and issued in more recent years. Most confirmation patent claims were found at the top 100 new interactions obtained from each subnetwork. We have also found papers on the Web which confirm new inferred interactions. For instance, the best new interaction inferred by our model relates the interaction between the adrenaline neurotransmitter and the androgen receptor gene. We have found a paper that reports the partial dependence of the antiapoptotic effect of adrenaline on androgen receptor. CONCLUSIONS: The VSM extended with a transitive closure approach provides a good way to identify biological interactions from textual collections. Specifically for the context of literature-based discovery, the extended VSM contributes to identify and rank relevant new interactions even if these interactions occcur in only a few documents in the collection. Consequently, we have developed an efficient method for extracting and restricting the best potential results to consider as new advances in life sciences, even when indications of these results are not easily observed from a mass of documents.
format Online
Article
Text
id pubmed-3287578
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32875782012-02-28 Can the vector space model be used to identify biological entity activities? Maciel, Wesley D Faria-Campos, Alessandra C Gonçalves, Marcos A Campos, Sérgio VA BMC Genomics Proceedings BACKGROUND: Biological systems are commonly described as networks of entity interactions. Some interactions are already known and integrate the current knowledge in life sciences. Others remain unknown for long periods of time and are frequently discovered by chance. In this work we present a model to predict these unknown interactions from a textual collection using the vector space model (VSM), a well known and established information retrieval model. We have extended the VSM ability to retrieve information using a transitive closure approach. Our objective is to use the VSM to identify the known interactions from the literature and construct a network. Based on interactions established in the network our model applies the transitive closure in order to predict and rank new interactions. RESULTS: We have tested and validated our model using a collection of patent claims issued from 1976 to 2005. From 266,528 possible interactions in our network, the model identified 1,027 known interactions and predicted 3,195 new interactions. Iterating the model according to patent issue dates, interactions found in a given past year were often confirmed by patent claims not in the collection and issued in more recent years. Most confirmation patent claims were found at the top 100 new interactions obtained from each subnetwork. We have also found papers on the Web which confirm new inferred interactions. For instance, the best new interaction inferred by our model relates the interaction between the adrenaline neurotransmitter and the androgen receptor gene. We have found a paper that reports the partial dependence of the antiapoptotic effect of adrenaline on androgen receptor. CONCLUSIONS: The VSM extended with a transitive closure approach provides a good way to identify biological interactions from textual collections. Specifically for the context of literature-based discovery, the extended VSM contributes to identify and rank relevant new interactions even if these interactions occcur in only a few documents in the collection. Consequently, we have developed an efficient method for extracting and restricting the best potential results to consider as new advances in life sciences, even when indications of these results are not easily observed from a mass of documents. BioMed Central 2011-12-22 /pmc/articles/PMC3287578/ /pubmed/22369514 http://dx.doi.org/10.1186/1471-2164-12-S4-S1 Text en Copyright ©2011 Maciel et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Maciel, Wesley D
Faria-Campos, Alessandra C
Gonçalves, Marcos A
Campos, Sérgio VA
Can the vector space model be used to identify biological entity activities?
title Can the vector space model be used to identify biological entity activities?
title_full Can the vector space model be used to identify biological entity activities?
title_fullStr Can the vector space model be used to identify biological entity activities?
title_full_unstemmed Can the vector space model be used to identify biological entity activities?
title_short Can the vector space model be used to identify biological entity activities?
title_sort can the vector space model be used to identify biological entity activities?
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287578/
https://www.ncbi.nlm.nih.gov/pubmed/22369514
http://dx.doi.org/10.1186/1471-2164-12-S4-S1
work_keys_str_mv AT macielwesleyd canthevectorspacemodelbeusedtoidentifybiologicalentityactivities
AT fariacamposalessandrac canthevectorspacemodelbeusedtoidentifybiologicalentityactivities
AT goncalvesmarcosa canthevectorspacemodelbeusedtoidentifybiologicalentityactivities
AT campossergiova canthevectorspacemodelbeusedtoidentifybiologicalentityactivities