Cargando…

Facilitating the development of controlled vocabularies for metabolomics technologies with text mining

BACKGROUND: Many bioinformatics applications rely on controlled vocabularies or ontologies to consistently interpret and seamlessly integrate information scattered across public resources. Experimental data sets from metabolomics studies need to be integrated with one another, but also with data pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Spasić, Irena, Schober, Daniel, Sansone, Susanna-Assunta, Rebholz-Schuhmann, Dietrich, Kell, Douglas B, Paton, Norman W
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367623/
https://www.ncbi.nlm.nih.gov/pubmed/18460187
http://dx.doi.org/10.1186/1471-2105-9-S5-S5
_version_ 1782154336244072448
author Spasić, Irena
Schober, Daniel
Sansone, Susanna-Assunta
Rebholz-Schuhmann, Dietrich
Kell, Douglas B
Paton, Norman W
author_facet Spasić, Irena
Schober, Daniel
Sansone, Susanna-Assunta
Rebholz-Schuhmann, Dietrich
Kell, Douglas B
Paton, Norman W
author_sort Spasić, Irena
collection PubMed
description BACKGROUND: Many bioinformatics applications rely on controlled vocabularies or ontologies to consistently interpret and seamlessly integrate information scattered across public resources. Experimental data sets from metabolomics studies need to be integrated with one another, but also with data produced by other types of omics studies in the spirit of systems biology, hence the pressing need for vocabularies and ontologies in metabolomics. However, it is time-consuming and non trivial to construct these resources manually. RESULTS: We describe a methodology for rapid development of controlled vocabularies, a study originally motivated by the needs for vocabularies describing metabolomics technologies. We present case studies involving two controlled vocabularies (for nuclear magnetic resonance spectroscopy and gas chromatography) whose development is currently underway as part of the Metabolomics Standards Initiative. The initial vocabularies were compiled manually, providing a total of 243 and 152 terms. A total of 5,699 and 2,612 new terms were acquired automatically from the literature. The analysis of the results showed that full-text articles (especially the Materials and Methods sections) are the major source of technology-specific terms as opposed to paper abstracts. CONCLUSIONS: We suggest a text mining method for efficient corpus-based term acquisition as a way of rapidly expanding a set of controlled vocabularies with the terms used in the scientific literature. We adopted an integrative approach, combining relatively generic software and data resources for time- and cost-effective development of a text mining tool for expansion of controlled vocabularies across various domains, as a practical alternative to both manual term collection and tailor-made named entity recognition methods.
format Text
id pubmed-2367623
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23676232008-05-07 Facilitating the development of controlled vocabularies for metabolomics technologies with text mining Spasić, Irena Schober, Daniel Sansone, Susanna-Assunta Rebholz-Schuhmann, Dietrich Kell, Douglas B Paton, Norman W BMC Bioinformatics Proceedings BACKGROUND: Many bioinformatics applications rely on controlled vocabularies or ontologies to consistently interpret and seamlessly integrate information scattered across public resources. Experimental data sets from metabolomics studies need to be integrated with one another, but also with data produced by other types of omics studies in the spirit of systems biology, hence the pressing need for vocabularies and ontologies in metabolomics. However, it is time-consuming and non trivial to construct these resources manually. RESULTS: We describe a methodology for rapid development of controlled vocabularies, a study originally motivated by the needs for vocabularies describing metabolomics technologies. We present case studies involving two controlled vocabularies (for nuclear magnetic resonance spectroscopy and gas chromatography) whose development is currently underway as part of the Metabolomics Standards Initiative. The initial vocabularies were compiled manually, providing a total of 243 and 152 terms. A total of 5,699 and 2,612 new terms were acquired automatically from the literature. The analysis of the results showed that full-text articles (especially the Materials and Methods sections) are the major source of technology-specific terms as opposed to paper abstracts. CONCLUSIONS: We suggest a text mining method for efficient corpus-based term acquisition as a way of rapidly expanding a set of controlled vocabularies with the terms used in the scientific literature. We adopted an integrative approach, combining relatively generic software and data resources for time- and cost-effective development of a text mining tool for expansion of controlled vocabularies across various domains, as a practical alternative to both manual term collection and tailor-made named entity recognition methods. BioMed Central 2008-04-29 /pmc/articles/PMC2367623/ /pubmed/18460187 http://dx.doi.org/10.1186/1471-2105-9-S5-S5 Text en Copyright © 2008 Spasić et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Spasić, Irena
Schober, Daniel
Sansone, Susanna-Assunta
Rebholz-Schuhmann, Dietrich
Kell, Douglas B
Paton, Norman W
Facilitating the development of controlled vocabularies for metabolomics technologies with text mining
title Facilitating the development of controlled vocabularies for metabolomics technologies with text mining
title_full Facilitating the development of controlled vocabularies for metabolomics technologies with text mining
title_fullStr Facilitating the development of controlled vocabularies for metabolomics technologies with text mining
title_full_unstemmed Facilitating the development of controlled vocabularies for metabolomics technologies with text mining
title_short Facilitating the development of controlled vocabularies for metabolomics technologies with text mining
title_sort facilitating the development of controlled vocabularies for metabolomics technologies with text mining
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367623/
https://www.ncbi.nlm.nih.gov/pubmed/18460187
http://dx.doi.org/10.1186/1471-2105-9-S5-S5
work_keys_str_mv AT spasicirena facilitatingthedevelopmentofcontrolledvocabulariesformetabolomicstechnologieswithtextmining
AT schoberdaniel facilitatingthedevelopmentofcontrolledvocabulariesformetabolomicstechnologieswithtextmining
AT sansonesusannaassunta facilitatingthedevelopmentofcontrolledvocabulariesformetabolomicstechnologieswithtextmining
AT rebholzschuhmanndietrich facilitatingthedevelopmentofcontrolledvocabulariesformetabolomicstechnologieswithtextmining
AT kelldouglasb facilitatingthedevelopmentofcontrolledvocabulariesformetabolomicstechnologieswithtextmining
AT patonnormanw facilitatingthedevelopmentofcontrolledvocabulariesformetabolomicstechnologieswithtextmining