Cargando…

Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease

BACKGROUND: Genomic functional information is valuable for biomedical research. However, such information frequently needs to be extracted from the scientific literature and structured in order to be exploited by automatic systems. Natural language processing is increasingly used for this purpose al...

Descripción completa

Detalles Bibliográficos
Autores principales:	Masseroli, Marco, Kilicoglu, Halil, Lang, François-Michel, Rindflesch, Thomas C
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1564420/ https://www.ncbi.nlm.nih.gov/pubmed/16762065 http://dx.doi.org/10.1186/1471-2105-7-291

_version_	1782129568939769856
author	Masseroli, Marco Kilicoglu, Halil Lang, François-Michel Rindflesch, Thomas C
author_facet	Masseroli, Marco Kilicoglu, Halil Lang, François-Michel Rindflesch, Thomas C
author_sort	Masseroli, Marco
collection	PubMed
description	BACKGROUND: Genomic functional information is valuable for biomedical research. However, such information frequently needs to be extracted from the scientific literature and structured in order to be exploited by automatic systems. Natural language processing is increasingly used for this purpose although it inherently involves errors. A postprocessing strategy that selects relations most likely to be correct is proposed and evaluated on the output of SemGen, a system that extracts semantic predications on the etiology of genetic diseases. Based on the number of intervening phrases between an argument and its predicate, we defined a heuristic strategy to filter the extracted semantic relations according to their likelihood of being correct. We also applied this strategy to relations identified with co-occurrence processing. Finally, we exploited postprocessed SemGen predications to investigate the genetic basis of Parkinson's disease. RESULTS: The filtering procedure for increased precision is based on the intuition that arguments which occur close to their predicate are easier to identify than those at a distance. For example, if gene-gene relations are filtered for arguments at a distance of 1 phrase from the predicate, precision increases from 41.95% (baseline) to 70.75%. Since this proximity filtering is based on syntactic structure, applying it to the results of co-occurrence processing is useful, but not as effective as when applied to the output of natural language processing. In an effort to exploit SemGen predications on the etiology of disease after increasing precision with postprocessing, a gene list was derived from extracted information enhanced with postprocessing filtering and was automatically annotated with GFINDer, a Web application that dynamically retrieves functional and phenotypic information from structured biomolecular resources. Two of the genes in this list are likely relevant to Parkinson's disease but are not associated with this disease in several important databases on genetic disorders. CONCLUSION: Information based on the proximity postprocessing method we suggest is of sufficient quality to be profitably used for subsequent applications aimed at uncovering new biomedical knowledge. Although proximity filtering is only marginally effective for enhancing the precision of relations extracted with co-occurrence processing, it is likely to benefit methods based, even partially, on syntactic structure, regardless of the relation.
format	Text
id	pubmed-1564420
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-15644202006-09-14 Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease Masseroli, Marco Kilicoglu, Halil Lang, François-Michel Rindflesch, Thomas C BMC Bioinformatics Research Article BACKGROUND: Genomic functional information is valuable for biomedical research. However, such information frequently needs to be extracted from the scientific literature and structured in order to be exploited by automatic systems. Natural language processing is increasingly used for this purpose although it inherently involves errors. A postprocessing strategy that selects relations most likely to be correct is proposed and evaluated on the output of SemGen, a system that extracts semantic predications on the etiology of genetic diseases. Based on the number of intervening phrases between an argument and its predicate, we defined a heuristic strategy to filter the extracted semantic relations according to their likelihood of being correct. We also applied this strategy to relations identified with co-occurrence processing. Finally, we exploited postprocessed SemGen predications to investigate the genetic basis of Parkinson's disease. RESULTS: The filtering procedure for increased precision is based on the intuition that arguments which occur close to their predicate are easier to identify than those at a distance. For example, if gene-gene relations are filtered for arguments at a distance of 1 phrase from the predicate, precision increases from 41.95% (baseline) to 70.75%. Since this proximity filtering is based on syntactic structure, applying it to the results of co-occurrence processing is useful, but not as effective as when applied to the output of natural language processing. In an effort to exploit SemGen predications on the etiology of disease after increasing precision with postprocessing, a gene list was derived from extracted information enhanced with postprocessing filtering and was automatically annotated with GFINDer, a Web application that dynamically retrieves functional and phenotypic information from structured biomolecular resources. Two of the genes in this list are likely relevant to Parkinson's disease but are not associated with this disease in several important databases on genetic disorders. CONCLUSION: Information based on the proximity postprocessing method we suggest is of sufficient quality to be profitably used for subsequent applications aimed at uncovering new biomedical knowledge. Although proximity filtering is only marginally effective for enhancing the precision of relations extracted with co-occurrence processing, it is likely to benefit methods based, even partially, on syntactic structure, regardless of the relation. BioMed Central 2006-06-08 /pmc/articles/PMC1564420/ /pubmed/16762065 http://dx.doi.org/10.1186/1471-2105-7-291 Text en Copyright © 2006 Masseroli et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Masseroli, Marco Kilicoglu, Halil Lang, François-Michel Rindflesch, Thomas C Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease
title	Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease
title_full	Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease
title_fullStr	Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease
title_full_unstemmed	Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease
title_short	Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease
title_sort	argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1564420/ https://www.ncbi.nlm.nih.gov/pubmed/16762065 http://dx.doi.org/10.1186/1471-2105-7-291
work_keys_str_mv	AT masserolimarco argumentpredicatedistanceasafilterforenhancingprecisioninextractingpredicationsonthegeneticetiologyofdisease AT kilicogluhalil argumentpredicatedistanceasafilterforenhancingprecisioninextractingpredicationsonthegeneticetiologyofdisease AT langfrancoismichel argumentpredicatedistanceasafilterforenhancingprecisioninextractingpredicationsonthegeneticetiologyofdisease AT rindfleschthomasc argumentpredicatedistanceasafilterforenhancingprecisioninextractingpredicationsonthegeneticetiologyofdisease

Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease

Ejemplares similares