Cargando…
Augmenting Product Defect Surveillance Through Web Crawling and Machine Learning in Singapore
INTRODUCTION: Substandard medicines are medicines that fail to meet their quality standards and/or specifications. Substandard medicines can lead to serious safety issues affecting public health. With the increasing number of pharmaceuticals and the complexity of the pharmaceutical manufacturing sup...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8214454/ https://www.ncbi.nlm.nih.gov/pubmed/34148223 http://dx.doi.org/10.1007/s40264-021-01084-w |
_version_ | 1783710066345508864 |
---|---|
author | Ang, Pei San Teo, Desmond Chun Hwee Dorajoo, Sreemanee Raaj Prem Kumar, Mukundaram Chan, Yi Hao Choong, Chih Tzer Phuah, Doris Sock Tin Tan, Dorothy Hooi Myn Tan, Filina Meixuan Huang, Huilin Tan, Maggie Siok Hwee Ng, Michelle Sau Yuen Poh, Jalene Wang Woon |
author_facet | Ang, Pei San Teo, Desmond Chun Hwee Dorajoo, Sreemanee Raaj Prem Kumar, Mukundaram Chan, Yi Hao Choong, Chih Tzer Phuah, Doris Sock Tin Tan, Dorothy Hooi Myn Tan, Filina Meixuan Huang, Huilin Tan, Maggie Siok Hwee Ng, Michelle Sau Yuen Poh, Jalene Wang Woon |
author_sort | Ang, Pei San |
collection | PubMed |
description | INTRODUCTION: Substandard medicines are medicines that fail to meet their quality standards and/or specifications. Substandard medicines can lead to serious safety issues affecting public health. With the increasing number of pharmaceuticals and the complexity of the pharmaceutical manufacturing supply chain, monitoring for substandard medicines via manual environmental scanning can be laborious and time consuming. METHODS: A web crawler was developed to automatically detect and extract alerts on substandard medicines published on the Internet by regulatory agencies. The crawled data were labelled as related to substandard medicines or not. An expert-derived keyword-based classification algorithm was compared against machine learning algorithms to identify substandard medicine alerts on two validation datasets (n = 4920 and n = 2458) from a later time period than training data. Models were comparatively assessed for recall, precision and their F1 scores (harmonic mean of precision and recall). RESULTS: The web crawler routinely extracted alerts from the 46 web pages belonging to nine regulatory agencies. From October 2019 to May 2020, 12,156 unique alerts were crawled of which 7378 (60.7%) alerts were set aside for validation and contained 1160 substandard medicine alerts (15.7%). An ensemble approach of combining machine learning and keywords achieved the best recall (94% and 97%), precision (85% and 80%) and F1 scores (89% and 88%) on temporal validation. CONCLUSIONS: Combining robust web crawler programmes with rigorously tested filtering algorithms based on machine learning and keyword models can automate and expand horizon scanning capabilities for issues relating to substandard medicines. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s40264-021-01084-w. |
format | Online Article Text |
id | pubmed-8214454 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-82144542021-06-21 Augmenting Product Defect Surveillance Through Web Crawling and Machine Learning in Singapore Ang, Pei San Teo, Desmond Chun Hwee Dorajoo, Sreemanee Raaj Prem Kumar, Mukundaram Chan, Yi Hao Choong, Chih Tzer Phuah, Doris Sock Tin Tan, Dorothy Hooi Myn Tan, Filina Meixuan Huang, Huilin Tan, Maggie Siok Hwee Ng, Michelle Sau Yuen Poh, Jalene Wang Woon Drug Saf Original Research Article INTRODUCTION: Substandard medicines are medicines that fail to meet their quality standards and/or specifications. Substandard medicines can lead to serious safety issues affecting public health. With the increasing number of pharmaceuticals and the complexity of the pharmaceutical manufacturing supply chain, monitoring for substandard medicines via manual environmental scanning can be laborious and time consuming. METHODS: A web crawler was developed to automatically detect and extract alerts on substandard medicines published on the Internet by regulatory agencies. The crawled data were labelled as related to substandard medicines or not. An expert-derived keyword-based classification algorithm was compared against machine learning algorithms to identify substandard medicine alerts on two validation datasets (n = 4920 and n = 2458) from a later time period than training data. Models were comparatively assessed for recall, precision and their F1 scores (harmonic mean of precision and recall). RESULTS: The web crawler routinely extracted alerts from the 46 web pages belonging to nine regulatory agencies. From October 2019 to May 2020, 12,156 unique alerts were crawled of which 7378 (60.7%) alerts were set aside for validation and contained 1160 substandard medicine alerts (15.7%). An ensemble approach of combining machine learning and keywords achieved the best recall (94% and 97%), precision (85% and 80%) and F1 scores (89% and 88%) on temporal validation. CONCLUSIONS: Combining robust web crawler programmes with rigorously tested filtering algorithms based on machine learning and keyword models can automate and expand horizon scanning capabilities for issues relating to substandard medicines. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s40264-021-01084-w. Springer International Publishing 2021-06-19 2021 /pmc/articles/PMC8214454/ /pubmed/34148223 http://dx.doi.org/10.1007/s40264-021-01084-w Text en © The Author(s), under exclusive licence to Springer Nature Switzerland AG 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Original Research Article Ang, Pei San Teo, Desmond Chun Hwee Dorajoo, Sreemanee Raaj Prem Kumar, Mukundaram Chan, Yi Hao Choong, Chih Tzer Phuah, Doris Sock Tin Tan, Dorothy Hooi Myn Tan, Filina Meixuan Huang, Huilin Tan, Maggie Siok Hwee Ng, Michelle Sau Yuen Poh, Jalene Wang Woon Augmenting Product Defect Surveillance Through Web Crawling and Machine Learning in Singapore |
title | Augmenting Product Defect Surveillance Through Web Crawling and Machine Learning in Singapore |
title_full | Augmenting Product Defect Surveillance Through Web Crawling and Machine Learning in Singapore |
title_fullStr | Augmenting Product Defect Surveillance Through Web Crawling and Machine Learning in Singapore |
title_full_unstemmed | Augmenting Product Defect Surveillance Through Web Crawling and Machine Learning in Singapore |
title_short | Augmenting Product Defect Surveillance Through Web Crawling and Machine Learning in Singapore |
title_sort | augmenting product defect surveillance through web crawling and machine learning in singapore |
topic | Original Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8214454/ https://www.ncbi.nlm.nih.gov/pubmed/34148223 http://dx.doi.org/10.1007/s40264-021-01084-w |
work_keys_str_mv | AT angpeisan augmentingproductdefectsurveillancethroughwebcrawlingandmachinelearninginsingapore AT teodesmondchunhwee augmentingproductdefectsurveillancethroughwebcrawlingandmachinelearninginsingapore AT dorajoosreemaneeraaj augmentingproductdefectsurveillancethroughwebcrawlingandmachinelearninginsingapore AT premkumarmukundaram augmentingproductdefectsurveillancethroughwebcrawlingandmachinelearninginsingapore AT chanyihao augmentingproductdefectsurveillancethroughwebcrawlingandmachinelearninginsingapore AT choongchihtzer augmentingproductdefectsurveillancethroughwebcrawlingandmachinelearninginsingapore AT phuahdorissocktin augmentingproductdefectsurveillancethroughwebcrawlingandmachinelearninginsingapore AT tandorothyhooimyn augmentingproductdefectsurveillancethroughwebcrawlingandmachinelearninginsingapore AT tanfilinameixuan augmentingproductdefectsurveillancethroughwebcrawlingandmachinelearninginsingapore AT huanghuilin augmentingproductdefectsurveillancethroughwebcrawlingandmachinelearninginsingapore AT tanmaggiesiokhwee augmentingproductdefectsurveillancethroughwebcrawlingandmachinelearninginsingapore AT ngmichellesauyuen augmentingproductdefectsurveillancethroughwebcrawlingandmachinelearninginsingapore AT pohjalenewangwoon augmentingproductdefectsurveillancethroughwebcrawlingandmachinelearninginsingapore |