Cargando…

CRFVoter: gene and protein related object recognition using a conglomerate of CRF-based tools

BACKGROUND: Gene and protein related objects are an important class of entities in biomedical research, whose identification and extraction from scientific articles is attracting increasing interest. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and...

Descripción completa

Detalles Bibliográficos
Autores principales: Hemati, Wahed, Mehler, Alexander
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6419804/
https://www.ncbi.nlm.nih.gov/pubmed/30874918
http://dx.doi.org/10.1186/s13321-019-0343-x
_version_ 1783404000192757760
author Hemati, Wahed
Mehler, Alexander
author_facet Hemati, Wahed
Mehler, Alexander
author_sort Hemati, Wahed
collection PubMed
description BACKGROUND: Gene and protein related objects are an important class of entities in biomedical research, whose identification and extraction from scientific articles is attracting increasing interest. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of gene and protein related objects. For this purpose, we transform the task as posed by BioCreative V.5 into a sequence labeling problem. We present a series of sequence labeling systems that we used and adapted in our experiments for solving this task. Our experiments show how to optimize the hyperparameters of the classifiers involved. To this end, we utilize various algorithms for hyperparameter optimization. Finally, we present CRFVoter, a two-stage application of Conditional Random Field (CRF) that integrates the optimized sequence labelers from our study into one ensemble classifier. RESULTS: We analyze the impact of hyperparameter optimization regarding named entity recognition in biomedical research and show that this optimization results in a performance increase of up to 60%. In our evaluation, our ensemble classifier based on multiple sequence labelers, called CRFVoter, outperforms each individual extractor’s performance. For the blinded test set provided by the BioCreative organizers, CRFVoter achieves an F-score of 75%, a recall of 71% and a precision of 80%. For the GPRO type 1 evaluation, CRFVoter achieves an F-Score of 73%, a recall of 70% and achieved the best precision (77%) among all task participants. CONCLUSION: CRFVoter is effective when multiple sequence labeling systems are to be used and performs better then the individual systems collected by it.
format Online
Article
Text
id pubmed-6419804
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-64198042019-03-28 CRFVoter: gene and protein related object recognition using a conglomerate of CRF-based tools Hemati, Wahed Mehler, Alexander J Cheminform Research Article BACKGROUND: Gene and protein related objects are an important class of entities in biomedical research, whose identification and extraction from scientific articles is attracting increasing interest. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of gene and protein related objects. For this purpose, we transform the task as posed by BioCreative V.5 into a sequence labeling problem. We present a series of sequence labeling systems that we used and adapted in our experiments for solving this task. Our experiments show how to optimize the hyperparameters of the classifiers involved. To this end, we utilize various algorithms for hyperparameter optimization. Finally, we present CRFVoter, a two-stage application of Conditional Random Field (CRF) that integrates the optimized sequence labelers from our study into one ensemble classifier. RESULTS: We analyze the impact of hyperparameter optimization regarding named entity recognition in biomedical research and show that this optimization results in a performance increase of up to 60%. In our evaluation, our ensemble classifier based on multiple sequence labelers, called CRFVoter, outperforms each individual extractor’s performance. For the blinded test set provided by the BioCreative organizers, CRFVoter achieves an F-score of 75%, a recall of 71% and a precision of 80%. For the GPRO type 1 evaluation, CRFVoter achieves an F-Score of 73%, a recall of 70% and achieved the best precision (77%) among all task participants. CONCLUSION: CRFVoter is effective when multiple sequence labeling systems are to be used and performs better then the individual systems collected by it. Springer International Publishing 2019-03-14 /pmc/articles/PMC6419804/ /pubmed/30874918 http://dx.doi.org/10.1186/s13321-019-0343-x Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Hemati, Wahed
Mehler, Alexander
CRFVoter: gene and protein related object recognition using a conglomerate of CRF-based tools
title CRFVoter: gene and protein related object recognition using a conglomerate of CRF-based tools
title_full CRFVoter: gene and protein related object recognition using a conglomerate of CRF-based tools
title_fullStr CRFVoter: gene and protein related object recognition using a conglomerate of CRF-based tools
title_full_unstemmed CRFVoter: gene and protein related object recognition using a conglomerate of CRF-based tools
title_short CRFVoter: gene and protein related object recognition using a conglomerate of CRF-based tools
title_sort crfvoter: gene and protein related object recognition using a conglomerate of crf-based tools
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6419804/
https://www.ncbi.nlm.nih.gov/pubmed/30874918
http://dx.doi.org/10.1186/s13321-019-0343-x
work_keys_str_mv AT hematiwahed crfvotergeneandproteinrelatedobjectrecognitionusingaconglomerateofcrfbasedtools
AT mehleralexander crfvotergeneandproteinrelatedobjectrecognitionusingaconglomerateofcrfbasedtools