Cargando…

PubExN: An Automated PubMed Bulk Article Extractor with Affiliation Normalization Package

Biomedical article extraction is the preliminary step for every biomedical application. These applications are helpful in finding the gene, disease, chemical, drugs, protein entities. Finding entities relation such as gene–gene entities, drug-disease interaction, and chemical protein relation the Pu...

Descripción completa

Detalles Bibliográficos
Autores principales: Kumar, Ashutosh, Sharaff, Aakanksha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Nature Singapore 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10132428/
https://www.ncbi.nlm.nih.gov/pubmed/37128512
http://dx.doi.org/10.1007/s42979-023-01687-3
_version_ 1785031385491177472
author Kumar, Ashutosh
Sharaff, Aakanksha
author_facet Kumar, Ashutosh
Sharaff, Aakanksha
author_sort Kumar, Ashutosh
collection PubMed
description Biomedical article extraction is the preliminary step for every biomedical application. These applications are helpful in finding the gene, disease, chemical, drugs, protein entities. Finding entities relation such as gene–gene entities, drug-disease interaction, and chemical protein relation the PubExN can be helpful for these types of biomedical applications. In most cases, domain experts do this extraction process on their own. Human interference makes this process time-consuming and there is a high probability, that documents can be missed during the extraction process. To get rid of these complicated processes a python package is introduced to automate the process of bulk extraction from the PubMed database. The extraction process covers all the citation information with the associated abstract. The batch approach is used to extract the bulk extraction. The motivation for the development of PubExN was to provide flexibility for the extraction process of biomedical article’s text data from NCBI’s PubMed database. Basically, NCBI’s PubMed database article contains the article id or can say PubMed-id (PMID), the title of the article, abstract, authors information, etc. This package will benefit many biomedical texts mining research including biomedical named entity recognition, biomedical relation extraction, literature discovery, knowledgebase creation, and various biomedical Natural Language Processing (NLP) tasks. In addition, it could be used in the author name disambiguation problems and new drug discoveries. This package will help save time and extra effort for the extraction and normalization process of PubMed articles.
format Online
Article
Text
id pubmed-10132428
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer Nature Singapore
record_format MEDLINE/PubMed
spelling pubmed-101324282023-04-27 PubExN: An Automated PubMed Bulk Article Extractor with Affiliation Normalization Package Kumar, Ashutosh Sharaff, Aakanksha SN Comput Sci Original Research Biomedical article extraction is the preliminary step for every biomedical application. These applications are helpful in finding the gene, disease, chemical, drugs, protein entities. Finding entities relation such as gene–gene entities, drug-disease interaction, and chemical protein relation the PubExN can be helpful for these types of biomedical applications. In most cases, domain experts do this extraction process on their own. Human interference makes this process time-consuming and there is a high probability, that documents can be missed during the extraction process. To get rid of these complicated processes a python package is introduced to automate the process of bulk extraction from the PubMed database. The extraction process covers all the citation information with the associated abstract. The batch approach is used to extract the bulk extraction. The motivation for the development of PubExN was to provide flexibility for the extraction process of biomedical article’s text data from NCBI’s PubMed database. Basically, NCBI’s PubMed database article contains the article id or can say PubMed-id (PMID), the title of the article, abstract, authors information, etc. This package will benefit many biomedical texts mining research including biomedical named entity recognition, biomedical relation extraction, literature discovery, knowledgebase creation, and various biomedical Natural Language Processing (NLP) tasks. In addition, it could be used in the author name disambiguation problems and new drug discoveries. This package will help save time and extra effort for the extraction and normalization process of PubMed articles. Springer Nature Singapore 2023-04-26 2023 /pmc/articles/PMC10132428/ /pubmed/37128512 http://dx.doi.org/10.1007/s42979-023-01687-3 Text en © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Original Research
Kumar, Ashutosh
Sharaff, Aakanksha
PubExN: An Automated PubMed Bulk Article Extractor with Affiliation Normalization Package
title PubExN: An Automated PubMed Bulk Article Extractor with Affiliation Normalization Package
title_full PubExN: An Automated PubMed Bulk Article Extractor with Affiliation Normalization Package
title_fullStr PubExN: An Automated PubMed Bulk Article Extractor with Affiliation Normalization Package
title_full_unstemmed PubExN: An Automated PubMed Bulk Article Extractor with Affiliation Normalization Package
title_short PubExN: An Automated PubMed Bulk Article Extractor with Affiliation Normalization Package
title_sort pubexn: an automated pubmed bulk article extractor with affiliation normalization package
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10132428/
https://www.ncbi.nlm.nih.gov/pubmed/37128512
http://dx.doi.org/10.1007/s42979-023-01687-3
work_keys_str_mv AT kumarashutosh pubexnanautomatedpubmedbulkarticleextractorwithaffiliationnormalizationpackage
AT sharaffaakanksha pubexnanautomatedpubmedbulkarticleextractorwithaffiliationnormalizationpackage