Cargando…
The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities
BACKGROUND: The acquisition of knowledge about relations between bacteria and their locations (habitats and geographical locations) in short texts about bacteria, as defined in the BioNLP-ST 2013 Bacteria Biotope task, depends on the detection of co-reference links between mentions of entities of ea...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4511182/ https://www.ncbi.nlm.nih.gov/pubmed/26201352 http://dx.doi.org/10.1186/1471-2105-16-S10-S6 |
_version_ | 1782382290299518976 |
---|---|
author | Lavergne, Thomas Grouin, Cyril Zweigenbaum, Pierre |
author_facet | Lavergne, Thomas Grouin, Cyril Zweigenbaum, Pierre |
author_sort | Lavergne, Thomas |
collection | PubMed |
description | BACKGROUND: The acquisition of knowledge about relations between bacteria and their locations (habitats and geographical locations) in short texts about bacteria, as defined in the BioNLP-ST 2013 Bacteria Biotope task, depends on the detection of co-reference links between mentions of entities of each of these three types. To our knowledge, no participant in this task has investigated this aspect of the situation. The present work specifically addresses issues raised by this situation: (i) how to detect these co-reference links and associated co-reference chains; (ii) how to use them to prepare positive and negative examples to train a supervised system for the detection of relations between entity mentions; (iii) what context around which entity mentions contributes to relation detection when co-reference chains are provided. RESULTS: We present experiments and results obtained both with gold entity mentions (task 2 of BioNLP-ST 2013) and with automatically detected entity mentions (end-to-end system, in task 3 of BioNLP-ST 2013). Our supervised mention detection system uses a linear chain Conditional Random Fields classifier, and our relation detection system relies on a Logistic Regression (aka Maximum Entropy) classifier. They use a set of morphological, morphosyntactic and semantic features. To minimize false inferences, co-reference resolution applies a set of heuristic rules designed to optimize precision. They take into account the types of the detected entity mentions, and take advantage of the didactic nature of the texts of the corpus, where a large proportion of bacteria naming is fairly explicit (although natural referring expressions such as "the bacteria" are common). The resulting system achieved a 0.495 F-measure on the official test set when taking as input the gold entity mentions, and a 0.351 F-measure when taking as input entity mentions predicted by our CRF system, both of which are above the best BioNLP-ST 2013 participant system. CONCLUSIONS: We show that co-reference resolution substantially improves over a baseline system which does not use co-reference information: about 3.5 F-measure points on the test corpus for the end-to-end system (5.5 points on the development corpus) and 7 F-measure points on both development and test corpora when gold mentions are used. While this outperforms the best published system on the BioNLP-ST 2013 Bacteria Biotope dataset, we consider that it provides mostly a stronger baseline from which more work can be started. We also emphasize the importance and difficulty of designing a comprehensive gold standard co-reference annotation, which we explain is a key point to further progress on the task. |
format | Online Article Text |
id | pubmed-4511182 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-45111822015-07-28 The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities Lavergne, Thomas Grouin, Cyril Zweigenbaum, Pierre BMC Bioinformatics Research BACKGROUND: The acquisition of knowledge about relations between bacteria and their locations (habitats and geographical locations) in short texts about bacteria, as defined in the BioNLP-ST 2013 Bacteria Biotope task, depends on the detection of co-reference links between mentions of entities of each of these three types. To our knowledge, no participant in this task has investigated this aspect of the situation. The present work specifically addresses issues raised by this situation: (i) how to detect these co-reference links and associated co-reference chains; (ii) how to use them to prepare positive and negative examples to train a supervised system for the detection of relations between entity mentions; (iii) what context around which entity mentions contributes to relation detection when co-reference chains are provided. RESULTS: We present experiments and results obtained both with gold entity mentions (task 2 of BioNLP-ST 2013) and with automatically detected entity mentions (end-to-end system, in task 3 of BioNLP-ST 2013). Our supervised mention detection system uses a linear chain Conditional Random Fields classifier, and our relation detection system relies on a Logistic Regression (aka Maximum Entropy) classifier. They use a set of morphological, morphosyntactic and semantic features. To minimize false inferences, co-reference resolution applies a set of heuristic rules designed to optimize precision. They take into account the types of the detected entity mentions, and take advantage of the didactic nature of the texts of the corpus, where a large proportion of bacteria naming is fairly explicit (although natural referring expressions such as "the bacteria" are common). The resulting system achieved a 0.495 F-measure on the official test set when taking as input the gold entity mentions, and a 0.351 F-measure when taking as input entity mentions predicted by our CRF system, both of which are above the best BioNLP-ST 2013 participant system. CONCLUSIONS: We show that co-reference resolution substantially improves over a baseline system which does not use co-reference information: about 3.5 F-measure points on the test corpus for the end-to-end system (5.5 points on the development corpus) and 7 F-measure points on both development and test corpora when gold mentions are used. While this outperforms the best published system on the BioNLP-ST 2013 Bacteria Biotope dataset, we consider that it provides mostly a stronger baseline from which more work can be started. We also emphasize the importance and difficulty of designing a comprehensive gold standard co-reference annotation, which we explain is a key point to further progress on the task. BioMed Central 2015-07-13 /pmc/articles/PMC4511182/ /pubmed/26201352 http://dx.doi.org/10.1186/1471-2105-16-S10-S6 Text en Copyright © 2015 Lavergne et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Lavergne, Thomas Grouin, Cyril Zweigenbaum, Pierre The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities |
title | The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities |
title_full | The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities |
title_fullStr | The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities |
title_full_unstemmed | The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities |
title_short | The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities |
title_sort | contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4511182/ https://www.ncbi.nlm.nih.gov/pubmed/26201352 http://dx.doi.org/10.1186/1471-2105-16-S10-S6 |
work_keys_str_mv | AT lavergnethomas thecontributionofcoreferenceresolutiontosupervisedrelationdetectionbetweenbacteriaandbiotopesentities AT grouincyril thecontributionofcoreferenceresolutiontosupervisedrelationdetectionbetweenbacteriaandbiotopesentities AT zweigenbaumpierre thecontributionofcoreferenceresolutiontosupervisedrelationdetectionbetweenbacteriaandbiotopesentities AT lavergnethomas contributionofcoreferenceresolutiontosupervisedrelationdetectionbetweenbacteriaandbiotopesentities AT grouincyril contributionofcoreferenceresolutiontosupervisedrelationdetectionbetweenbacteriaandbiotopesentities AT zweigenbaumpierre contributionofcoreferenceresolutiontosupervisedrelationdetectionbetweenbacteriaandbiotopesentities |