Cargando…

Named entity recognition on bio-medical literature documents using hybrid based approach

There have been many changes in the medical field due to technological advances. The progression in technologies provides lot of opportunities to extract valuable insights from huge amount of unstructured data. The literature documents published by the researchers in medical domain consists enormous...

Descripción completa

Detalles Bibliográficos
Autores principales: Ramachandran, R., Arutchelvan, K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7947151/
https://www.ncbi.nlm.nih.gov/pubmed/33723489
http://dx.doi.org/10.1007/s12652-021-03078-z
Descripción
Sumario:There have been many changes in the medical field due to technological advances. The progression in technologies provides lot of opportunities to extract valuable insights from huge amount of unstructured data. The literature documents published by the researchers in medical domain consists enormous amount of knowledge. Many organizations are involving in retrieving the hidden information from the literature documents. Extracting the drug names, diseases, symptoms, route of administration, species and dosage forms from the textual document is an easy task due to the innovation of technologies in the Natural Language Processing. In this article, a new hybrid based approach is proposed to identify named entity from the medical literature documents. New dictionary has been built for route of administration, dosage forms and symptoms to annotate the entities in the medical documents. The annotated entities are trained by the blank Spacy machine learning model. The trained model provide a decent accuracy when compared with the existing model. The hybrid model is validated with the dictionary and human (optional)to calculate the confusion matrix. It is able to identify more entities than the prevailing model. The average F1 score for five entities of the proposed hybrid based approach 73.79%.