Cargando…
Terminologies for text-mining; an experiment in the lipoprotein metabolism domain
BACKGROUND: The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and int...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367629/ https://www.ncbi.nlm.nih.gov/pubmed/18460175 http://dx.doi.org/10.1186/1471-2105-9-S4-S2 |
_version_ | 1782154337685864448 |
---|---|
author | Alexopoulou, Dimitra Wächter, Thomas Pickersgill, Laura Eyre, Cecilia Schroeder, Michael |
author_facet | Alexopoulou, Dimitra Wächter, Thomas Pickersgill, Laura Eyre, Cecilia Schroeder, Michael |
author_sort | Alexopoulou, Dimitra |
collection | PubMed |
description | BACKGROUND: The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them. RESULTS: We share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition (ATR) methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods. CONCLUSIONS: Given high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking ATR results as input and following the guidelines we described. AVAILABILITY: The TFIDF term recognition is available as Web Service, described at |
format | Text |
id | pubmed-2367629 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-23676292008-05-07 Terminologies for text-mining; an experiment in the lipoprotein metabolism domain Alexopoulou, Dimitra Wächter, Thomas Pickersgill, Laura Eyre, Cecilia Schroeder, Michael BMC Bioinformatics Research BACKGROUND: The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them. RESULTS: We share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition (ATR) methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods. CONCLUSIONS: Given high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking ATR results as input and following the guidelines we described. AVAILABILITY: The TFIDF term recognition is available as Web Service, described at BioMed Central 2008-04-25 /pmc/articles/PMC2367629/ /pubmed/18460175 http://dx.doi.org/10.1186/1471-2105-9-S4-S2 Text en Copyright © 2008 Alexopoulou et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Alexopoulou, Dimitra Wächter, Thomas Pickersgill, Laura Eyre, Cecilia Schroeder, Michael Terminologies for text-mining; an experiment in the lipoprotein metabolism domain |
title | Terminologies for text-mining; an experiment in the lipoprotein metabolism domain |
title_full | Terminologies for text-mining; an experiment in the lipoprotein metabolism domain |
title_fullStr | Terminologies for text-mining; an experiment in the lipoprotein metabolism domain |
title_full_unstemmed | Terminologies for text-mining; an experiment in the lipoprotein metabolism domain |
title_short | Terminologies for text-mining; an experiment in the lipoprotein metabolism domain |
title_sort | terminologies for text-mining; an experiment in the lipoprotein metabolism domain |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367629/ https://www.ncbi.nlm.nih.gov/pubmed/18460175 http://dx.doi.org/10.1186/1471-2105-9-S4-S2 |
work_keys_str_mv | AT alexopouloudimitra terminologiesfortextmininganexperimentinthelipoproteinmetabolismdomain AT wachterthomas terminologiesfortextmininganexperimentinthelipoproteinmetabolismdomain AT pickersgilllaura terminologiesfortextmininganexperimentinthelipoproteinmetabolismdomain AT eyrececilia terminologiesfortextmininganexperimentinthelipoproteinmetabolismdomain AT schroedermichael terminologiesfortextmininganexperimentinthelipoproteinmetabolismdomain |