Cargando…

Terminologies for text-mining; an experiment in the lipoprotein metabolism domain

BACKGROUND: The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and int...

Descripción completa

Detalles Bibliográficos
Autores principales: Alexopoulou, Dimitra, Wächter, Thomas, Pickersgill, Laura, Eyre, Cecilia, Schroeder, Michael
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367629/
https://www.ncbi.nlm.nih.gov/pubmed/18460175
http://dx.doi.org/10.1186/1471-2105-9-S4-S2
_version_ 1782154337685864448
author Alexopoulou, Dimitra
Wächter, Thomas
Pickersgill, Laura
Eyre, Cecilia
Schroeder, Michael
author_facet Alexopoulou, Dimitra
Wächter, Thomas
Pickersgill, Laura
Eyre, Cecilia
Schroeder, Michael
author_sort Alexopoulou, Dimitra
collection PubMed
description BACKGROUND: The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them. RESULTS: We share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition (ATR) methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods. CONCLUSIONS: Given high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking ATR results as input and following the guidelines we described. AVAILABILITY: The TFIDF term recognition is available as Web Service, described at
format Text
id pubmed-2367629
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23676292008-05-07 Terminologies for text-mining; an experiment in the lipoprotein metabolism domain Alexopoulou, Dimitra Wächter, Thomas Pickersgill, Laura Eyre, Cecilia Schroeder, Michael BMC Bioinformatics Research BACKGROUND: The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them. RESULTS: We share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition (ATR) methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods. CONCLUSIONS: Given high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking ATR results as input and following the guidelines we described. AVAILABILITY: The TFIDF term recognition is available as Web Service, described at BioMed Central 2008-04-25 /pmc/articles/PMC2367629/ /pubmed/18460175 http://dx.doi.org/10.1186/1471-2105-9-S4-S2 Text en Copyright © 2008 Alexopoulou et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Alexopoulou, Dimitra
Wächter, Thomas
Pickersgill, Laura
Eyre, Cecilia
Schroeder, Michael
Terminologies for text-mining; an experiment in the lipoprotein metabolism domain
title Terminologies for text-mining; an experiment in the lipoprotein metabolism domain
title_full Terminologies for text-mining; an experiment in the lipoprotein metabolism domain
title_fullStr Terminologies for text-mining; an experiment in the lipoprotein metabolism domain
title_full_unstemmed Terminologies for text-mining; an experiment in the lipoprotein metabolism domain
title_short Terminologies for text-mining; an experiment in the lipoprotein metabolism domain
title_sort terminologies for text-mining; an experiment in the lipoprotein metabolism domain
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367629/
https://www.ncbi.nlm.nih.gov/pubmed/18460175
http://dx.doi.org/10.1186/1471-2105-9-S4-S2
work_keys_str_mv AT alexopouloudimitra terminologiesfortextmininganexperimentinthelipoproteinmetabolismdomain
AT wachterthomas terminologiesfortextmininganexperimentinthelipoproteinmetabolismdomain
AT pickersgilllaura terminologiesfortextmininganexperimentinthelipoproteinmetabolismdomain
AT eyrececilia terminologiesfortextmininganexperimentinthelipoproteinmetabolismdomain
AT schroedermichael terminologiesfortextmininganexperimentinthelipoproteinmetabolismdomain