Cargando…

Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents

[Image: see text] The US Food and Drug Administration (FDA) regulatory process often involves several reviewers who focus on sets of information related to their respective areas of review. Accordingly, manufacturers that provide submission packages to regulatory agencies are instructed to organize...

Descripción completa

Detalles Bibliográficos
Autores principales: Gray, Magnus, Xu, Joshua, Tong, Weida, Wu, Leihong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10445280/
https://www.ncbi.nlm.nih.gov/pubmed/37487037
http://dx.doi.org/10.1021/acs.chemrestox.3c00028
_version_ 1785094139998633984
author Gray, Magnus
Xu, Joshua
Tong, Weida
Wu, Leihong
author_facet Gray, Magnus
Xu, Joshua
Tong, Weida
Wu, Leihong
author_sort Gray, Magnus
collection PubMed
description [Image: see text] The US Food and Drug Administration (FDA) regulatory process often involves several reviewers who focus on sets of information related to their respective areas of review. Accordingly, manufacturers that provide submission packages to regulatory agencies are instructed to organize the contents using a structure that enables the information to be easily allocated, retrieved, and reviewed. However, this practice is not always followed correctly; as such, some documents are not well structured, with similar information spreading across different sections, hindering the efficient access and review of all of the relevant data as a whole. To improve this common situation, we evaluated an artificial intelligence (AI)-based natural language processing (NLP) methodology, called Bidirectional Encoder Representations from Transformers (BERT), to automatically classify free-text information into standardized sections, supporting a holistic review of drug safety and efficacy. Specifically, FDA labeling documents were used in this study as a proof of concept, where the labeling section structure defined by the Physician Label Rule (PLR) was used to classify labels in the development of the model. The model was subsequently evaluated on texts from both well-structured labeling documents (i.e., PLR-based labeling) and less- or differently structured documents (i.e., non-PLR and Summary of Product Characteristic [SmPC] labeling.) In the training process, the model yielded 96% and 88% accuracy for binary and multiclass tasks, respectively. The testing accuracies observed for the PLR, non-PLR, and SmPC testing data sets for the binary model were 95%, 88%, and 88%, and for the multiclass model were 82%, 73%, and 68%, respectively. Our study demonstrated that automatically classifying free texts into standardized sections with AI language models could be an advanced regulatory science approach for supporting the review process by effectively processing unformatted documents.
format Online
Article
Text
id pubmed-10445280
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-104452802023-08-24 Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents Gray, Magnus Xu, Joshua Tong, Weida Wu, Leihong Chem Res Toxicol [Image: see text] The US Food and Drug Administration (FDA) regulatory process often involves several reviewers who focus on sets of information related to their respective areas of review. Accordingly, manufacturers that provide submission packages to regulatory agencies are instructed to organize the contents using a structure that enables the information to be easily allocated, retrieved, and reviewed. However, this practice is not always followed correctly; as such, some documents are not well structured, with similar information spreading across different sections, hindering the efficient access and review of all of the relevant data as a whole. To improve this common situation, we evaluated an artificial intelligence (AI)-based natural language processing (NLP) methodology, called Bidirectional Encoder Representations from Transformers (BERT), to automatically classify free-text information into standardized sections, supporting a holistic review of drug safety and efficacy. Specifically, FDA labeling documents were used in this study as a proof of concept, where the labeling section structure defined by the Physician Label Rule (PLR) was used to classify labels in the development of the model. The model was subsequently evaluated on texts from both well-structured labeling documents (i.e., PLR-based labeling) and less- or differently structured documents (i.e., non-PLR and Summary of Product Characteristic [SmPC] labeling.) In the training process, the model yielded 96% and 88% accuracy for binary and multiclass tasks, respectively. The testing accuracies observed for the PLR, non-PLR, and SmPC testing data sets for the binary model were 95%, 88%, and 88%, and for the multiclass model were 82%, 73%, and 68%, respectively. Our study demonstrated that automatically classifying free texts into standardized sections with AI language models could be an advanced regulatory science approach for supporting the review process by effectively processing unformatted documents. American Chemical Society 2023-07-24 /pmc/articles/PMC10445280/ /pubmed/37487037 http://dx.doi.org/10.1021/acs.chemrestox.3c00028 Text en Not subject to U.S. Copyright. Published 2023 by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Gray, Magnus
Xu, Joshua
Tong, Weida
Wu, Leihong
Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents
title Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents
title_full Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents
title_fullStr Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents
title_full_unstemmed Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents
title_short Classifying Free Texts Into Predefined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents
title_sort classifying free texts into predefined sections using ai in regulatory documents: a case study with drug labeling documents
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10445280/
https://www.ncbi.nlm.nih.gov/pubmed/37487037
http://dx.doi.org/10.1021/acs.chemrestox.3c00028
work_keys_str_mv AT graymagnus classifyingfreetextsintopredefinedsectionsusingaiinregulatorydocumentsacasestudywithdruglabelingdocuments
AT xujoshua classifyingfreetextsintopredefinedsectionsusingaiinregulatorydocumentsacasestudywithdruglabelingdocuments
AT tongweida classifyingfreetextsintopredefinedsectionsusingaiinregulatorydocumentsacasestudywithdruglabelingdocuments
AT wuleihong classifyingfreetextsintopredefinedsectionsusingaiinregulatorydocumentsacasestudywithdruglabelingdocuments