Cargando…

EnvMine: A text-mining system for the automatic extraction of contextual information

BACKGROUND: For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to...

Descripción completa

Detalles Bibliográficos
Autores principales: Tamames, Javier, de Lorenzo, Victor
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2901371/
https://www.ncbi.nlm.nih.gov/pubmed/20515448
http://dx.doi.org/10.1186/1471-2105-11-294
_version_ 1782183684651089920
author Tamames, Javier
de Lorenzo, Victor
author_facet Tamames, Javier
de Lorenzo, Victor
author_sort Tamames, Javier
collection PubMed
description BACKGROUND: For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles). So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations) from textual sources of any kind. RESULTS: EnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved) of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings. Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude), thus allowing the calculation of distance between the individual locations. CONCLUSION: EnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical variables of sampling sites, thus facilitating the performance of ecological analyses. EnvMine can also help in the development of standards for the annotation of environmental features.
format Text
id pubmed-2901371
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29013712010-07-10 EnvMine: A text-mining system for the automatic extraction of contextual information Tamames, Javier de Lorenzo, Victor BMC Bioinformatics Methodology article BACKGROUND: For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles). So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations) from textual sources of any kind. RESULTS: EnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved) of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings. Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude), thus allowing the calculation of distance between the individual locations. CONCLUSION: EnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical variables of sampling sites, thus facilitating the performance of ecological analyses. EnvMine can also help in the development of standards for the annotation of environmental features. BioMed Central 2010-06-01 /pmc/articles/PMC2901371/ /pubmed/20515448 http://dx.doi.org/10.1186/1471-2105-11-294 Text en Copyright ©2010 Tamames and de Lorenzo; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology article
Tamames, Javier
de Lorenzo, Victor
EnvMine: A text-mining system for the automatic extraction of contextual information
title EnvMine: A text-mining system for the automatic extraction of contextual information
title_full EnvMine: A text-mining system for the automatic extraction of contextual information
title_fullStr EnvMine: A text-mining system for the automatic extraction of contextual information
title_full_unstemmed EnvMine: A text-mining system for the automatic extraction of contextual information
title_short EnvMine: A text-mining system for the automatic extraction of contextual information
title_sort envmine: a text-mining system for the automatic extraction of contextual information
topic Methodology article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2901371/
https://www.ncbi.nlm.nih.gov/pubmed/20515448
http://dx.doi.org/10.1186/1471-2105-11-294
work_keys_str_mv AT tamamesjavier envmineatextminingsystemfortheautomaticextractionofcontextualinformation
AT delorenzovictor envmineatextminingsystemfortheautomaticextractionofcontextualinformation