Cargando…

A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain

BACKGROUND: A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within scientific publications or their sentences. These approaches have not typically addressed the challenge of extractin...

Descripción completa

Detalles Bibliográficos
Autores principales: Hassanpour, Saeed, O’Connor, Martin J, Das, Amar K
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3765483/
https://www.ncbi.nlm.nih.gov/pubmed/23937724
http://dx.doi.org/10.1186/2041-1480-4-14
_version_ 1782283319953588224
author Hassanpour, Saeed
O’Connor, Martin J
Das, Amar K
author_facet Hassanpour, Saeed
O’Connor, Martin J
Das, Amar K
author_sort Hassanpour, Saeed
collection PubMed
description BACKGROUND: A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within scientific publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text. RESULTS: Using an existing knowledge base of 156 autism phenotype definitions and an annotated corpus of 26 source articles containing such definitions, we evaluated and compared the average rank of correctly identified rule definition or corresponding rule template using both our semantic-based approach and a standard term-based approach. We examined three separate scenarios: (1) the snippet of text contained a definition already in the knowledge base; (2) the snippet contained an alternative definition for a concept in the knowledge base; and (3) the snippet contained a definition not in the knowledge base. Our semantic-based approach had a higher average rank than the term-based approach for each of the three scenarios (scenario 1: 3.8 vs. 5.0; scenario 2: 2.8 vs. 4.9; and scenario 3: 4.5 vs. 6.2), with each comparison significant at the p-value of 0.05 using the Wilcoxon signed-rank test. CONCLUSIONS: Our work shows that leveraging existing domain knowledge in the information extraction of biomedical definitions significantly improves the correct identification of such knowledge within sentences. Our method can thus help researchers rapidly acquire knowledge about biomedical definitions that are specified and evolving within an ever-growing corpus of scientific publications.
format Online
Article
Text
id pubmed-3765483
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-37654832013-09-10 A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain Hassanpour, Saeed O’Connor, Martin J Das, Amar K J Biomed Semantics Research BACKGROUND: A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within scientific publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text. RESULTS: Using an existing knowledge base of 156 autism phenotype definitions and an annotated corpus of 26 source articles containing such definitions, we evaluated and compared the average rank of correctly identified rule definition or corresponding rule template using both our semantic-based approach and a standard term-based approach. We examined three separate scenarios: (1) the snippet of text contained a definition already in the knowledge base; (2) the snippet contained an alternative definition for a concept in the knowledge base; and (3) the snippet contained a definition not in the knowledge base. Our semantic-based approach had a higher average rank than the term-based approach for each of the three scenarios (scenario 1: 3.8 vs. 5.0; scenario 2: 2.8 vs. 4.9; and scenario 3: 4.5 vs. 6.2), with each comparison significant at the p-value of 0.05 using the Wilcoxon signed-rank test. CONCLUSIONS: Our work shows that leveraging existing domain knowledge in the information extraction of biomedical definitions significantly improves the correct identification of such knowledge within sentences. Our method can thus help researchers rapidly acquire knowledge about biomedical definitions that are specified and evolving within an ever-growing corpus of scientific publications. BioMed Central 2013-08-12 /pmc/articles/PMC3765483/ /pubmed/23937724 http://dx.doi.org/10.1186/2041-1480-4-14 Text en Copyright © 2013 Hassanpour et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Hassanpour, Saeed
O’Connor, Martin J
Das, Amar K
A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain
title A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain
title_full A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain
title_fullStr A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain
title_full_unstemmed A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain
title_short A semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain
title_sort semantic-based method for extracting concept definitions from scientific publications: evaluation in the autism phenotype domain
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3765483/
https://www.ncbi.nlm.nih.gov/pubmed/23937724
http://dx.doi.org/10.1186/2041-1480-4-14
work_keys_str_mv AT hassanpoursaeed asemanticbasedmethodforextractingconceptdefinitionsfromscientificpublicationsevaluationintheautismphenotypedomain
AT oconnormartinj asemanticbasedmethodforextractingconceptdefinitionsfromscientificpublicationsevaluationintheautismphenotypedomain
AT dasamark asemanticbasedmethodforextractingconceptdefinitionsfromscientificpublicationsevaluationintheautismphenotypedomain
AT hassanpoursaeed semanticbasedmethodforextractingconceptdefinitionsfromscientificpublicationsevaluationintheautismphenotypedomain
AT oconnormartinj semanticbasedmethodforextractingconceptdefinitionsfromscientificpublicationsevaluationintheautismphenotypedomain
AT dasamark semanticbasedmethodforextractingconceptdefinitionsfromscientificpublicationsevaluationintheautismphenotypedomain