Cargando…
A sentence sliding window approach to extract protein annotations from biomedical articles
BACKGROUND: Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2005
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869011/ https://www.ncbi.nlm.nih.gov/pubmed/15960831 http://dx.doi.org/10.1186/1471-2105-6-S1-S19 |
_version_ | 1782133426962300928 |
---|---|
author | Krallinger, Martin Padron, Maria Valencia, Alfonso |
author_facet | Krallinger, Martin Padron, Maria Valencia, Alfonso |
author_sort | Krallinger, Martin |
collection | PubMed |
description | BACKGROUND: Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the proposed methods and the development of common evaluation criteria. This issue was addressed by the Critical Assessment of Text Mining Methods in Molecular Biology (BioCreative) contest. The aim of this contest was to assess the performance of text mining systems applied to biomedical texts including tools which recognize named entities such as genes and proteins, and tools which automatically extract protein annotations. RESULTS: The "sentence sliding window" approach proposed here was found to efficiently extract text fragments from full text articles containing annotations on proteins, providing the highest number of correctly predicted annotations. Moreover, the number of correct extractions of individual entities (i.e. proteins and GO terms) involved in the relationships used for the annotations was significantly higher than the correct extractions of the complete annotations (protein-function relations). CONCLUSION: We explored the use of averaging sentence sliding windows for information extraction, especially in a context where conventional training data is unavailable. The combination of our approach with more refined statistical estimators and machine learning techniques might be a way to improve annotation extraction for future biomedical text mining applications. |
format | Text |
id | pubmed-1869011 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2005 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-18690112007-05-18 A sentence sliding window approach to extract protein annotations from biomedical articles Krallinger, Martin Padron, Maria Valencia, Alfonso BMC Bioinformatics Report BACKGROUND: Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the proposed methods and the development of common evaluation criteria. This issue was addressed by the Critical Assessment of Text Mining Methods in Molecular Biology (BioCreative) contest. The aim of this contest was to assess the performance of text mining systems applied to biomedical texts including tools which recognize named entities such as genes and proteins, and tools which automatically extract protein annotations. RESULTS: The "sentence sliding window" approach proposed here was found to efficiently extract text fragments from full text articles containing annotations on proteins, providing the highest number of correctly predicted annotations. Moreover, the number of correct extractions of individual entities (i.e. proteins and GO terms) involved in the relationships used for the annotations was significantly higher than the correct extractions of the complete annotations (protein-function relations). CONCLUSION: We explored the use of averaging sentence sliding windows for information extraction, especially in a context where conventional training data is unavailable. The combination of our approach with more refined statistical estimators and machine learning techniques might be a way to improve annotation extraction for future biomedical text mining applications. BioMed Central 2005-05-24 /pmc/articles/PMC1869011/ /pubmed/15960831 http://dx.doi.org/10.1186/1471-2105-6-S1-S19 Text en Copyright © 2005 Krallinger et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Report Krallinger, Martin Padron, Maria Valencia, Alfonso A sentence sliding window approach to extract protein annotations from biomedical articles |
title | A sentence sliding window approach to extract protein annotations from biomedical articles |
title_full | A sentence sliding window approach to extract protein annotations from biomedical articles |
title_fullStr | A sentence sliding window approach to extract protein annotations from biomedical articles |
title_full_unstemmed | A sentence sliding window approach to extract protein annotations from biomedical articles |
title_short | A sentence sliding window approach to extract protein annotations from biomedical articles |
title_sort | sentence sliding window approach to extract protein annotations from biomedical articles |
topic | Report |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869011/ https://www.ncbi.nlm.nih.gov/pubmed/15960831 http://dx.doi.org/10.1186/1471-2105-6-S1-S19 |
work_keys_str_mv | AT krallingermartin asentenceslidingwindowapproachtoextractproteinannotationsfrombiomedicalarticles AT padronmaria asentenceslidingwindowapproachtoextractproteinannotationsfrombiomedicalarticles AT valenciaalfonso asentenceslidingwindowapproachtoextractproteinannotationsfrombiomedicalarticles AT krallingermartin sentenceslidingwindowapproachtoextractproteinannotationsfrombiomedicalarticles AT padronmaria sentenceslidingwindowapproachtoextractproteinannotationsfrombiomedicalarticles AT valenciaalfonso sentenceslidingwindowapproachtoextractproteinannotationsfrombiomedicalarticles |