Cargando…

An annotated corpus with nanomedicine and pharmacokinetic parameters

A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Althou...

Descripción completa

Detalles Bibliográficos
Autores principales: Lewinski, Nastassja A, Jimenez, Ivan, McInnes, Bridget T
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Dove Medical Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5644562/
https://www.ncbi.nlm.nih.gov/pubmed/29066897
http://dx.doi.org/10.2147/IJN.S137117
_version_ 1783271748519591936
author Lewinski, Nastassja A
Jimenez, Ivan
McInnes, Bridget T
author_facet Lewinski, Nastassja A
Jimenez, Ivan
McInnes, Bridget T
author_sort Lewinski, Nastassja A
collection PubMed
description A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided.
format Online
Article
Text
id pubmed-5644562
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Dove Medical Press
record_format MEDLINE/PubMed
spelling pubmed-56445622017-10-24 An annotated corpus with nanomedicine and pharmacokinetic parameters Lewinski, Nastassja A Jimenez, Ivan McInnes, Bridget T Int J Nanomedicine Original Research A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided. Dove Medical Press 2017-10-12 /pmc/articles/PMC5644562/ /pubmed/29066897 http://dx.doi.org/10.2147/IJN.S137117 Text en © 2017 Lewinski et al. This work is published and licensed by Dove Medical Press Limited The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution – Non Commercial (unported, v3.0) License (http://creativecommons.org/licenses/by-nc/3.0/). By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed.
spellingShingle Original Research
Lewinski, Nastassja A
Jimenez, Ivan
McInnes, Bridget T
An annotated corpus with nanomedicine and pharmacokinetic parameters
title An annotated corpus with nanomedicine and pharmacokinetic parameters
title_full An annotated corpus with nanomedicine and pharmacokinetic parameters
title_fullStr An annotated corpus with nanomedicine and pharmacokinetic parameters
title_full_unstemmed An annotated corpus with nanomedicine and pharmacokinetic parameters
title_short An annotated corpus with nanomedicine and pharmacokinetic parameters
title_sort annotated corpus with nanomedicine and pharmacokinetic parameters
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5644562/
https://www.ncbi.nlm.nih.gov/pubmed/29066897
http://dx.doi.org/10.2147/IJN.S137117
work_keys_str_mv AT lewinskinastassjaa anannotatedcorpuswithnanomedicineandpharmacokineticparameters
AT jimenezivan anannotatedcorpuswithnanomedicineandpharmacokineticparameters
AT mcinnesbridgett anannotatedcorpuswithnanomedicineandpharmacokineticparameters
AT lewinskinastassjaa annotatedcorpuswithnanomedicineandpharmacokineticparameters
AT jimenezivan annotatedcorpuswithnanomedicineandpharmacokineticparameters
AT mcinnesbridgett annotatedcorpuswithnanomedicineandpharmacokineticparameters