Cargando…

CAT: computer aided triage improving upon the Bayes risk through ε-refusal triage rules

BACKGROUND: Manual extraction of information from electronic pathology (epath) reports to populate the Surveillance, Epidemiology, and End Result (SEER) database is labor intensive. Systematizing the data extraction automatically using machine-learning (ML) and natural language processing (NLP) is d...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hengartner, Nicolas, Cuellar, Leticia, Wu, Xiao-Cheng, Tourassi, Georgia, Qiu, John, Christian, Blair, Bhattacharya, Tanmoy
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6302364/ https://www.ncbi.nlm.nih.gov/pubmed/30577756 http://dx.doi.org/10.1186/s12859-018-2503-9

_version_	1783381961638674432
author	Hengartner, Nicolas Cuellar, Leticia Wu, Xiao-Cheng Tourassi, Georgia Qiu, John Christian, Blair Bhattacharya, Tanmoy
author_facet	Hengartner, Nicolas Cuellar, Leticia Wu, Xiao-Cheng Tourassi, Georgia Qiu, John Christian, Blair Bhattacharya, Tanmoy
author_sort	Hengartner, Nicolas
collection	PubMed
description	BACKGROUND: Manual extraction of information from electronic pathology (epath) reports to populate the Surveillance, Epidemiology, and End Result (SEER) database is labor intensive. Systematizing the data extraction automatically using machine-learning (ML) and natural language processing (NLP) is desirable to reduce the human labor required to populate the SEER database and to improve the timeliness of the data. This enables scaling up registry efficiency and collection of new data elements. To ensure the integrity, quality, and continuity of the SEER data, the misclassification error of ML and NPL algorithms needs to be negligible. Current algorithms fail to achieve the precision of human experts who can bring additional information in their assessments. Differences in registry format and the desire to develop a common information extraction platform further complicate the ML/NLP tasks. The purpose of our study is to develop triage rules to partially automate registry workflow to improve the precision of the auto-extracted information. RESULTS: This paper presents a mathematical framework to improve the precision of a classifier beyond that of the Bayes classifier by selectively classifying item that are most likely to be correct. This results in a triage rule that only classifies a subset of the item. We characterize the optimal triage rule and demonstrate its usefulness in the problem of classifying cancer site from electronic pathology reports to achieve a desired precision. CONCLUSIONS: From the mathematical formalism, we propose a heuristic estimate for triage rule based on post-processing the soft-max output from standard machine learning algorithms. We show, in test cases, that the triage rule significantly improve the classification accuracy.
format	Online Article Text
id	pubmed-6302364
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-63023642018-12-31 CAT: computer aided triage improving upon the Bayes risk through ε-refusal triage rules Hengartner, Nicolas Cuellar, Leticia Wu, Xiao-Cheng Tourassi, Georgia Qiu, John Christian, Blair Bhattacharya, Tanmoy BMC Bioinformatics Research BACKGROUND: Manual extraction of information from electronic pathology (epath) reports to populate the Surveillance, Epidemiology, and End Result (SEER) database is labor intensive. Systematizing the data extraction automatically using machine-learning (ML) and natural language processing (NLP) is desirable to reduce the human labor required to populate the SEER database and to improve the timeliness of the data. This enables scaling up registry efficiency and collection of new data elements. To ensure the integrity, quality, and continuity of the SEER data, the misclassification error of ML and NPL algorithms needs to be negligible. Current algorithms fail to achieve the precision of human experts who can bring additional information in their assessments. Differences in registry format and the desire to develop a common information extraction platform further complicate the ML/NLP tasks. The purpose of our study is to develop triage rules to partially automate registry workflow to improve the precision of the auto-extracted information. RESULTS: This paper presents a mathematical framework to improve the precision of a classifier beyond that of the Bayes classifier by selectively classifying item that are most likely to be correct. This results in a triage rule that only classifies a subset of the item. We characterize the optimal triage rule and demonstrate its usefulness in the problem of classifying cancer site from electronic pathology reports to achieve a desired precision. CONCLUSIONS: From the mathematical formalism, we propose a heuristic estimate for triage rule based on post-processing the soft-max output from standard machine learning algorithms. We show, in test cases, that the triage rule significantly improve the classification accuracy. BioMed Central 2018-12-21 /pmc/articles/PMC6302364/ /pubmed/30577756 http://dx.doi.org/10.1186/s12859-018-2503-9 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Hengartner, Nicolas Cuellar, Leticia Wu, Xiao-Cheng Tourassi, Georgia Qiu, John Christian, Blair Bhattacharya, Tanmoy CAT: computer aided triage improving upon the Bayes risk through ε-refusal triage rules
title	CAT: computer aided triage improving upon the Bayes risk through ε-refusal triage rules
title_full	CAT: computer aided triage improving upon the Bayes risk through ε-refusal triage rules
title_fullStr	CAT: computer aided triage improving upon the Bayes risk through ε-refusal triage rules
title_full_unstemmed	CAT: computer aided triage improving upon the Bayes risk through ε-refusal triage rules
title_short	CAT: computer aided triage improving upon the Bayes risk through ε-refusal triage rules
title_sort	cat: computer aided triage improving upon the bayes risk through ε-refusal triage rules
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6302364/ https://www.ncbi.nlm.nih.gov/pubmed/30577756 http://dx.doi.org/10.1186/s12859-018-2503-9
work_keys_str_mv	AT hengartnernicolas catcomputeraidedtriageimprovinguponthebayesriskthrougherefusaltriagerules AT cuellarleticia catcomputeraidedtriageimprovinguponthebayesriskthrougherefusaltriagerules AT wuxiaocheng catcomputeraidedtriageimprovinguponthebayesriskthrougherefusaltriagerules AT tourassigeorgia catcomputeraidedtriageimprovinguponthebayesriskthrougherefusaltriagerules AT qiujohn catcomputeraidedtriageimprovinguponthebayesriskthrougherefusaltriagerules AT christianblair catcomputeraidedtriageimprovinguponthebayesriskthrougherefusaltriagerules AT bhattacharyatanmoy catcomputeraidedtriageimprovinguponthebayesriskthrougherefusaltriagerules

CAT: computer aided triage improving upon the Bayes risk through ε-refusal triage rules

Ejemplares similares