Cargando…

A sentence sliding window approach to extract protein annotations from biomedical articles

BACKGROUND: Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the...

Descripción completa

Detalles Bibliográficos
Autores principales: Krallinger, Martin, Padron, Maria, Valencia, Alfonso
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869011/
https://www.ncbi.nlm.nih.gov/pubmed/15960831
http://dx.doi.org/10.1186/1471-2105-6-S1-S19
_version_ 1782133426962300928
author Krallinger, Martin
Padron, Maria
Valencia, Alfonso
author_facet Krallinger, Martin
Padron, Maria
Valencia, Alfonso
author_sort Krallinger, Martin
collection PubMed
description BACKGROUND: Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the proposed methods and the development of common evaluation criteria. This issue was addressed by the Critical Assessment of Text Mining Methods in Molecular Biology (BioCreative) contest. The aim of this contest was to assess the performance of text mining systems applied to biomedical texts including tools which recognize named entities such as genes and proteins, and tools which automatically extract protein annotations. RESULTS: The "sentence sliding window" approach proposed here was found to efficiently extract text fragments from full text articles containing annotations on proteins, providing the highest number of correctly predicted annotations. Moreover, the number of correct extractions of individual entities (i.e. proteins and GO terms) involved in the relationships used for the annotations was significantly higher than the correct extractions of the complete annotations (protein-function relations). CONCLUSION: We explored the use of averaging sentence sliding windows for information extraction, especially in a context where conventional training data is unavailable. The combination of our approach with more refined statistical estimators and machine learning techniques might be a way to improve annotation extraction for future biomedical text mining applications.
format Text
id pubmed-1869011
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18690112007-05-18 A sentence sliding window approach to extract protein annotations from biomedical articles Krallinger, Martin Padron, Maria Valencia, Alfonso BMC Bioinformatics Report BACKGROUND: Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the proposed methods and the development of common evaluation criteria. This issue was addressed by the Critical Assessment of Text Mining Methods in Molecular Biology (BioCreative) contest. The aim of this contest was to assess the performance of text mining systems applied to biomedical texts including tools which recognize named entities such as genes and proteins, and tools which automatically extract protein annotations. RESULTS: The "sentence sliding window" approach proposed here was found to efficiently extract text fragments from full text articles containing annotations on proteins, providing the highest number of correctly predicted annotations. Moreover, the number of correct extractions of individual entities (i.e. proteins and GO terms) involved in the relationships used for the annotations was significantly higher than the correct extractions of the complete annotations (protein-function relations). CONCLUSION: We explored the use of averaging sentence sliding windows for information extraction, especially in a context where conventional training data is unavailable. The combination of our approach with more refined statistical estimators and machine learning techniques might be a way to improve annotation extraction for future biomedical text mining applications. BioMed Central 2005-05-24 /pmc/articles/PMC1869011/ /pubmed/15960831 http://dx.doi.org/10.1186/1471-2105-6-S1-S19 Text en Copyright © 2005 Krallinger et al; licensee BioMed Central Ltd http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Report
Krallinger, Martin
Padron, Maria
Valencia, Alfonso
A sentence sliding window approach to extract protein annotations from biomedical articles
title A sentence sliding window approach to extract protein annotations from biomedical articles
title_full A sentence sliding window approach to extract protein annotations from biomedical articles
title_fullStr A sentence sliding window approach to extract protein annotations from biomedical articles
title_full_unstemmed A sentence sliding window approach to extract protein annotations from biomedical articles
title_short A sentence sliding window approach to extract protein annotations from biomedical articles
title_sort sentence sliding window approach to extract protein annotations from biomedical articles
topic Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1869011/
https://www.ncbi.nlm.nih.gov/pubmed/15960831
http://dx.doi.org/10.1186/1471-2105-6-S1-S19
work_keys_str_mv AT krallingermartin asentenceslidingwindowapproachtoextractproteinannotationsfrombiomedicalarticles
AT padronmaria asentenceslidingwindowapproachtoextractproteinannotationsfrombiomedicalarticles
AT valenciaalfonso asentenceslidingwindowapproachtoextractproteinannotationsfrombiomedicalarticles
AT krallingermartin sentenceslidingwindowapproachtoextractproteinannotationsfrombiomedicalarticles
AT padronmaria sentenceslidingwindowapproachtoextractproteinannotationsfrombiomedicalarticles
AT valenciaalfonso sentenceslidingwindowapproachtoextractproteinannotationsfrombiomedicalarticles