Cargando…

Use of text-mining methods to improve efficiency in the calculation of drug exposure to support pharmacoepidemiology studies

BACKGROUND: Efficient generation of structured dose instructions that enable researchers to calculate drug exposure is central to pharmacoepidemiology studies. Our aim was to design and test an algorithm to codify dose instructions, applied to the NHS Scotland Prescribing Information System (PIS) th...

Descripción completa

Detalles Bibliográficos
Autores principales:	McTaggart, Stuart, Nangle, Clifford, Caldwell, Jacqueline, Alvarez-Madrazo, Samantha, Colhoun, Helen, Bennie, Marion
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2018
Materias:	Methods
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5913611/ https://www.ncbi.nlm.nih.gov/pubmed/29420741 http://dx.doi.org/10.1093/ije/dyx264

_version_	1783316572737110016
author	McTaggart, Stuart Nangle, Clifford Caldwell, Jacqueline Alvarez-Madrazo, Samantha Colhoun, Helen Bennie, Marion
author_facet	McTaggart, Stuart Nangle, Clifford Caldwell, Jacqueline Alvarez-Madrazo, Samantha Colhoun, Helen Bennie, Marion
author_sort	McTaggart, Stuart
collection	PubMed
description	BACKGROUND: Efficient generation of structured dose instructions that enable researchers to calculate drug exposure is central to pharmacoepidemiology studies. Our aim was to design and test an algorithm to codify dose instructions, applied to the NHS Scotland Prescribing Information System (PIS) that records about 100 million prescriptions per annum. METHODS: A natural language processing (NLP) algorithm was developed that enabled free-text dose instructions to be represented by three attributes – quantity, frequency and qualifier – specified by three, three and two variables, respectively. A sample of 15 593 distinct dose instructions was used to test, validate and refine the algorithm. The final algorithm used a zero-assumption approach and was then applied to the full dataset. RESULTS: The initial algorithm generated structured output for 13 152 (84.34%) of the 15 593 sample dose instructions, and reviewers identified 767 (5.83%) incorrect translations, giving an accuracy of 94.17%. Following subsequent refinement of the algorithm rules, application to the full dataset of 458 227 687 prescriptions (99.67% had dose instructions represented by 4 964 083 distinct instructions) generated a structured output for 92.3% of dose instruction texts. This varied by therapeutic area (from 86.7% for the central nervous system to 96.8% for the cardiovascular system). CONCLUSIONS: We created an NLP algorithm, operational at scale, to produce structured output that gives data users maximum flexibility to formulate, test and apply their own assumptions according to the medicines under investigation. Text mining approaches can provide a solution to the safe and efficient management and provisioning of large volumes of data generated through our health systems.
format	Online Article Text
id	pubmed-5913611
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-59136112018-04-30 Use of text-mining methods to improve efficiency in the calculation of drug exposure to support pharmacoepidemiology studies McTaggart, Stuart Nangle, Clifford Caldwell, Jacqueline Alvarez-Madrazo, Samantha Colhoun, Helen Bennie, Marion Int J Epidemiol Methods BACKGROUND: Efficient generation of structured dose instructions that enable researchers to calculate drug exposure is central to pharmacoepidemiology studies. Our aim was to design and test an algorithm to codify dose instructions, applied to the NHS Scotland Prescribing Information System (PIS) that records about 100 million prescriptions per annum. METHODS: A natural language processing (NLP) algorithm was developed that enabled free-text dose instructions to be represented by three attributes – quantity, frequency and qualifier – specified by three, three and two variables, respectively. A sample of 15 593 distinct dose instructions was used to test, validate and refine the algorithm. The final algorithm used a zero-assumption approach and was then applied to the full dataset. RESULTS: The initial algorithm generated structured output for 13 152 (84.34%) of the 15 593 sample dose instructions, and reviewers identified 767 (5.83%) incorrect translations, giving an accuracy of 94.17%. Following subsequent refinement of the algorithm rules, application to the full dataset of 458 227 687 prescriptions (99.67% had dose instructions represented by 4 964 083 distinct instructions) generated a structured output for 92.3% of dose instruction texts. This varied by therapeutic area (from 86.7% for the central nervous system to 96.8% for the cardiovascular system). CONCLUSIONS: We created an NLP algorithm, operational at scale, to produce structured output that gives data users maximum flexibility to formulate, test and apply their own assumptions according to the medicines under investigation. Text mining approaches can provide a solution to the safe and efficient management and provisioning of large volumes of data generated through our health systems. Oxford University Press 2018-04 2018-02-06 /pmc/articles/PMC5913611/ /pubmed/29420741 http://dx.doi.org/10.1093/ije/dyx264 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of the International Epidemiological Association. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methods McTaggart, Stuart Nangle, Clifford Caldwell, Jacqueline Alvarez-Madrazo, Samantha Colhoun, Helen Bennie, Marion Use of text-mining methods to improve efficiency in the calculation of drug exposure to support pharmacoepidemiology studies
title	Use of text-mining methods to improve efficiency in the calculation of drug exposure to support pharmacoepidemiology studies
title_full	Use of text-mining methods to improve efficiency in the calculation of drug exposure to support pharmacoepidemiology studies
title_fullStr	Use of text-mining methods to improve efficiency in the calculation of drug exposure to support pharmacoepidemiology studies
title_full_unstemmed	Use of text-mining methods to improve efficiency in the calculation of drug exposure to support pharmacoepidemiology studies
title_short	Use of text-mining methods to improve efficiency in the calculation of drug exposure to support pharmacoepidemiology studies
title_sort	use of text-mining methods to improve efficiency in the calculation of drug exposure to support pharmacoepidemiology studies
topic	Methods
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5913611/ https://www.ncbi.nlm.nih.gov/pubmed/29420741 http://dx.doi.org/10.1093/ije/dyx264
work_keys_str_mv	AT mctaggartstuart useoftextminingmethodstoimproveefficiencyinthecalculationofdrugexposuretosupportpharmacoepidemiologystudies AT nangleclifford useoftextminingmethodstoimproveefficiencyinthecalculationofdrugexposuretosupportpharmacoepidemiologystudies AT caldwelljacqueline useoftextminingmethodstoimproveefficiencyinthecalculationofdrugexposuretosupportpharmacoepidemiologystudies AT alvarezmadrazosamantha useoftextminingmethodstoimproveefficiencyinthecalculationofdrugexposuretosupportpharmacoepidemiologystudies AT colhounhelen useoftextminingmethodstoimproveefficiencyinthecalculationofdrugexposuretosupportpharmacoepidemiologystudies AT benniemarion useoftextminingmethodstoimproveefficiencyinthecalculationofdrugexposuretosupportpharmacoepidemiologystudies

Use of text-mining methods to improve efficiency in the calculation of drug exposure to support pharmacoepidemiology studies

Ejemplares similares