Cargando…
Can the vector space model be used to identify biological entity activities?
BACKGROUND: Biological systems are commonly described as networks of entity interactions. Some interactions are already known and integrate the current knowledge in life sciences. Others remain unknown for long periods of time and are frequently discovered by chance. In this work we present a model...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287578/ https://www.ncbi.nlm.nih.gov/pubmed/22369514 http://dx.doi.org/10.1186/1471-2164-12-S4-S1 |
_version_ | 1782224695409508352 |
---|---|
author | Maciel, Wesley D Faria-Campos, Alessandra C Gonçalves, Marcos A Campos, Sérgio VA |
author_facet | Maciel, Wesley D Faria-Campos, Alessandra C Gonçalves, Marcos A Campos, Sérgio VA |
author_sort | Maciel, Wesley D |
collection | PubMed |
description | BACKGROUND: Biological systems are commonly described as networks of entity interactions. Some interactions are already known and integrate the current knowledge in life sciences. Others remain unknown for long periods of time and are frequently discovered by chance. In this work we present a model to predict these unknown interactions from a textual collection using the vector space model (VSM), a well known and established information retrieval model. We have extended the VSM ability to retrieve information using a transitive closure approach. Our objective is to use the VSM to identify the known interactions from the literature and construct a network. Based on interactions established in the network our model applies the transitive closure in order to predict and rank new interactions. RESULTS: We have tested and validated our model using a collection of patent claims issued from 1976 to 2005. From 266,528 possible interactions in our network, the model identified 1,027 known interactions and predicted 3,195 new interactions. Iterating the model according to patent issue dates, interactions found in a given past year were often confirmed by patent claims not in the collection and issued in more recent years. Most confirmation patent claims were found at the top 100 new interactions obtained from each subnetwork. We have also found papers on the Web which confirm new inferred interactions. For instance, the best new interaction inferred by our model relates the interaction between the adrenaline neurotransmitter and the androgen receptor gene. We have found a paper that reports the partial dependence of the antiapoptotic effect of adrenaline on androgen receptor. CONCLUSIONS: The VSM extended with a transitive closure approach provides a good way to identify biological interactions from textual collections. Specifically for the context of literature-based discovery, the extended VSM contributes to identify and rank relevant new interactions even if these interactions occcur in only a few documents in the collection. Consequently, we have developed an efficient method for extracting and restricting the best potential results to consider as new advances in life sciences, even when indications of these results are not easily observed from a mass of documents. |
format | Online Article Text |
id | pubmed-3287578 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-32875782012-02-28 Can the vector space model be used to identify biological entity activities? Maciel, Wesley D Faria-Campos, Alessandra C Gonçalves, Marcos A Campos, Sérgio VA BMC Genomics Proceedings BACKGROUND: Biological systems are commonly described as networks of entity interactions. Some interactions are already known and integrate the current knowledge in life sciences. Others remain unknown for long periods of time and are frequently discovered by chance. In this work we present a model to predict these unknown interactions from a textual collection using the vector space model (VSM), a well known and established information retrieval model. We have extended the VSM ability to retrieve information using a transitive closure approach. Our objective is to use the VSM to identify the known interactions from the literature and construct a network. Based on interactions established in the network our model applies the transitive closure in order to predict and rank new interactions. RESULTS: We have tested and validated our model using a collection of patent claims issued from 1976 to 2005. From 266,528 possible interactions in our network, the model identified 1,027 known interactions and predicted 3,195 new interactions. Iterating the model according to patent issue dates, interactions found in a given past year were often confirmed by patent claims not in the collection and issued in more recent years. Most confirmation patent claims were found at the top 100 new interactions obtained from each subnetwork. We have also found papers on the Web which confirm new inferred interactions. For instance, the best new interaction inferred by our model relates the interaction between the adrenaline neurotransmitter and the androgen receptor gene. We have found a paper that reports the partial dependence of the antiapoptotic effect of adrenaline on androgen receptor. CONCLUSIONS: The VSM extended with a transitive closure approach provides a good way to identify biological interactions from textual collections. Specifically for the context of literature-based discovery, the extended VSM contributes to identify and rank relevant new interactions even if these interactions occcur in only a few documents in the collection. Consequently, we have developed an efficient method for extracting and restricting the best potential results to consider as new advances in life sciences, even when indications of these results are not easily observed from a mass of documents. BioMed Central 2011-12-22 /pmc/articles/PMC3287578/ /pubmed/22369514 http://dx.doi.org/10.1186/1471-2164-12-S4-S1 Text en Copyright ©2011 Maciel et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Maciel, Wesley D Faria-Campos, Alessandra C Gonçalves, Marcos A Campos, Sérgio VA Can the vector space model be used to identify biological entity activities? |
title | Can the vector space model be used to identify biological entity activities? |
title_full | Can the vector space model be used to identify biological entity activities? |
title_fullStr | Can the vector space model be used to identify biological entity activities? |
title_full_unstemmed | Can the vector space model be used to identify biological entity activities? |
title_short | Can the vector space model be used to identify biological entity activities? |
title_sort | can the vector space model be used to identify biological entity activities? |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287578/ https://www.ncbi.nlm.nih.gov/pubmed/22369514 http://dx.doi.org/10.1186/1471-2164-12-S4-S1 |
work_keys_str_mv | AT macielwesleyd canthevectorspacemodelbeusedtoidentifybiologicalentityactivities AT fariacamposalessandrac canthevectorspacemodelbeusedtoidentifybiologicalentityactivities AT goncalvesmarcosa canthevectorspacemodelbeusedtoidentifybiologicalentityactivities AT campossergiova canthevectorspacemodelbeusedtoidentifybiologicalentityactivities |