Cargando…

Finding biomedical categories in Medline(®)

BACKGROUND: There are several humanly defined ontologies relevant to Medline. However, Medline is a fast growing collection of biomedical documents which creates difficulties in updating and expanding these humanly defined ontologies. Automatically identifying meaningful categories of entities in a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yeganova, Lana, Kim, Won, Comeau, Donald C, Wilbur, W John
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3465206/ https://www.ncbi.nlm.nih.gov/pubmed/23046816 http://dx.doi.org/10.1186/2041-1480-3-S3-S3

_version_	1782245527473094656
author	Yeganova, Lana Kim, Won Comeau, Donald C Wilbur, W John
author_facet	Yeganova, Lana Kim, Won Comeau, Donald C Wilbur, W John
author_sort	Yeganova, Lana
collection	PubMed
description	BACKGROUND: There are several humanly defined ontologies relevant to Medline. However, Medline is a fast growing collection of biomedical documents which creates difficulties in updating and expanding these humanly defined ontologies. Automatically identifying meaningful categories of entities in a large text corpus is useful for information extraction, construction of machine learning features, and development of semantic representations. In this paper we describe and compare two methods for automatically learning meaningful biomedical categories in Medline. The first approach is a simple statistical method that uses part-of-speech and frequency information to extract a list of frequent nouns from Medline. The second method implements an alignment-based technique to learn frequent generic patterns that indicate a hyponymy/hypernymy relationship between a pair of noun phrases. We then apply these patterns to Medline to collect frequent hypernyms as potential biomedical categories. RESULTS: We study and compare these two alternative sets of terms to identify semantic categories in Medline. We find that both approaches produce reasonable terms as potential categories. We also find that there is a significant agreement between the two sets of terms. The overlap between the two methods improves our confidence regarding categories predicted by these independent methods. CONCLUSIONS: This study is an initial attempt to extract categories that are discussed in Medline. Rather than imposing external ontologies on Medline, our methods allow categories to emerge from the text.
format	Online Article Text
id	pubmed-3465206
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-34652062012-10-18 Finding biomedical categories in Medline(®) Yeganova, Lana Kim, Won Comeau, Donald C Wilbur, W John J Biomed Semantics Research BACKGROUND: There are several humanly defined ontologies relevant to Medline. However, Medline is a fast growing collection of biomedical documents which creates difficulties in updating and expanding these humanly defined ontologies. Automatically identifying meaningful categories of entities in a large text corpus is useful for information extraction, construction of machine learning features, and development of semantic representations. In this paper we describe and compare two methods for automatically learning meaningful biomedical categories in Medline. The first approach is a simple statistical method that uses part-of-speech and frequency information to extract a list of frequent nouns from Medline. The second method implements an alignment-based technique to learn frequent generic patterns that indicate a hyponymy/hypernymy relationship between a pair of noun phrases. We then apply these patterns to Medline to collect frequent hypernyms as potential biomedical categories. RESULTS: We study and compare these two alternative sets of terms to identify semantic categories in Medline. We find that both approaches produce reasonable terms as potential categories. We also find that there is a significant agreement between the two sets of terms. The overlap between the two methods improves our confidence regarding categories predicted by these independent methods. CONCLUSIONS: This study is an initial attempt to extract categories that are discussed in Medline. Rather than imposing external ontologies on Medline, our methods allow categories to emerge from the text. BioMed Central 2012-10-05 /pmc/articles/PMC3465206/ /pubmed/23046816 http://dx.doi.org/10.1186/2041-1480-3-S3-S3 Text en Copyright ©2012 The article is a work of the United States Government; Title U.S.C 5 105 provides that copyright protection is not available for any work of the United States government in the United satiates; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Yeganova, Lana Kim, Won Comeau, Donald C Wilbur, W John Finding biomedical categories in Medline(®)
title	Finding biomedical categories in Medline(®)
title_full	Finding biomedical categories in Medline(®)
title_fullStr	Finding biomedical categories in Medline(®)
title_full_unstemmed	Finding biomedical categories in Medline(®)
title_short	Finding biomedical categories in Medline(®)
title_sort	finding biomedical categories in medline(®)
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3465206/ https://www.ncbi.nlm.nih.gov/pubmed/23046816 http://dx.doi.org/10.1186/2041-1480-3-S3-S3
work_keys_str_mv	AT yeganovalana findingbiomedicalcategoriesinmedline AT kimwon findingbiomedicalcategoriesinmedline AT comeaudonaldc findingbiomedicalcategoriesinmedline AT wilburwjohn findingbiomedicalcategoriesinmedline

Finding biomedical categories in Medline(®)

Ejemplares similares