Cargando…

Extract antibody and antigen names from biomedical literature

BACKGROUND: The roles of antibody and antigen are indispensable in targeted diagnosis, therapy, and biomedical discovery. On top of that, massive numbers of new scientific articles about antibodies and/or antigens are published each year, which is a precious knowledge resource but has yet been explo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dinh, Thuy Trang, Vo-Chanh, Trang Phuong, Nguyen, Chau, Huynh, Viet Quoc, Vo, Nam, Nguyen, Hoang Duc
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2022
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9727932/ https://www.ncbi.nlm.nih.gov/pubmed/36474140 http://dx.doi.org/10.1186/s12859-022-04993-4

_version_	1784845135276670976
author	Dinh, Thuy Trang Vo-Chanh, Trang Phuong Nguyen, Chau Huynh, Viet Quoc Vo, Nam Nguyen, Hoang Duc
author_facet	Dinh, Thuy Trang Vo-Chanh, Trang Phuong Nguyen, Chau Huynh, Viet Quoc Vo, Nam Nguyen, Hoang Duc
author_sort	Dinh, Thuy Trang
collection	PubMed
description	BACKGROUND: The roles of antibody and antigen are indispensable in targeted diagnosis, therapy, and biomedical discovery. On top of that, massive numbers of new scientific articles about antibodies and/or antigens are published each year, which is a precious knowledge resource but has yet been exploited to its full potential. We, therefore, aim to develop a biomedical natural language processing tool that can automatically identify antibody and antigen entities from articles. RESULTS: We first annotated an antibody-antigen corpus including 3210 relevant PubMed abstracts using a semi-automatic approach. The Inter-Annotator Agreement score of 3 annotators ranges from 91.46 to 94.31%, indicating that the annotations are consistent and the corpus is reliable. We then used the corpus to develop and optimize BiLSTM-CRF-based and BioBERT-based models. The models achieved overall F1 scores of 62.49% and 81.44%, respectively, which showed potential for newly studied entities. The two models served as foundation for development of a named entity recognition (NER) tool that automatically recognizes antibody and antigen names from biomedical literature. CONCLUSIONS: Our antibody-antigen NER models enable users to automatically extract antibody and antigen names from scientific articles without manually scanning through vast amounts of data and information in the literature. The output of NER can be used to automatically populate antibody-antigen databases, support antibody validation, and facilitate researchers with the most appropriate antibodies of interest. The packaged NER model is available at https://github.com/TrangDinh44/ABAG_BioBERT.git.
format	Online Article Text
id	pubmed-9727932
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-97279322022-12-08 Extract antibody and antigen names from biomedical literature Dinh, Thuy Trang Vo-Chanh, Trang Phuong Nguyen, Chau Huynh, Viet Quoc Vo, Nam Nguyen, Hoang Duc BMC Bioinformatics Research BACKGROUND: The roles of antibody and antigen are indispensable in targeted diagnosis, therapy, and biomedical discovery. On top of that, massive numbers of new scientific articles about antibodies and/or antigens are published each year, which is a precious knowledge resource but has yet been exploited to its full potential. We, therefore, aim to develop a biomedical natural language processing tool that can automatically identify antibody and antigen entities from articles. RESULTS: We first annotated an antibody-antigen corpus including 3210 relevant PubMed abstracts using a semi-automatic approach. The Inter-Annotator Agreement score of 3 annotators ranges from 91.46 to 94.31%, indicating that the annotations are consistent and the corpus is reliable. We then used the corpus to develop and optimize BiLSTM-CRF-based and BioBERT-based models. The models achieved overall F1 scores of 62.49% and 81.44%, respectively, which showed potential for newly studied entities. The two models served as foundation for development of a named entity recognition (NER) tool that automatically recognizes antibody and antigen names from biomedical literature. CONCLUSIONS: Our antibody-antigen NER models enable users to automatically extract antibody and antigen names from scientific articles without manually scanning through vast amounts of data and information in the literature. The output of NER can be used to automatically populate antibody-antigen databases, support antibody validation, and facilitate researchers with the most appropriate antibodies of interest. The packaged NER model is available at https://github.com/TrangDinh44/ABAG_BioBERT.git. BioMed Central 2022-12-06 /pmc/articles/PMC9727932/ /pubmed/36474140 http://dx.doi.org/10.1186/s12859-022-04993-4 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Dinh, Thuy Trang Vo-Chanh, Trang Phuong Nguyen, Chau Huynh, Viet Quoc Vo, Nam Nguyen, Hoang Duc Extract antibody and antigen names from biomedical literature
title	Extract antibody and antigen names from biomedical literature
title_full	Extract antibody and antigen names from biomedical literature
title_fullStr	Extract antibody and antigen names from biomedical literature
title_full_unstemmed	Extract antibody and antigen names from biomedical literature
title_short	Extract antibody and antigen names from biomedical literature
title_sort	extract antibody and antigen names from biomedical literature
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9727932/ https://www.ncbi.nlm.nih.gov/pubmed/36474140 http://dx.doi.org/10.1186/s12859-022-04993-4
work_keys_str_mv	AT dinhthuytrang extractantibodyandantigennamesfrombiomedicalliterature AT vochanhtrangphuong extractantibodyandantigennamesfrombiomedicalliterature AT nguyenchau extractantibodyandantigennamesfrombiomedicalliterature AT huynhvietquoc extractantibodyandantigennamesfrombiomedicalliterature AT vonam extractantibodyandantigennamesfrombiomedicalliterature AT nguyenhoangduc extractantibodyandantigennamesfrombiomedicalliterature

Extract antibody and antigen names from biomedical literature

Ejemplares similares