Cargando…

Word2Vec inversion and traditional text classifiers for phenotyping lupus

BACKGROUND: Identifying patients with certain clinical criteria based on manual chart review of doctors’ notes is a daunting task given the massive amounts of text notes in the electronic health records (EHR). This task can be automated using text classifiers based on Natural Language Processing (NL...

Descripción completa

Detalles Bibliográficos
Autores principales:	Turner, Clayton A., Jacobs, Alexander D., Marques, Cassios K., Oates, James C., Kamen, Diane L., Anderson, Paul E., Obeid, Jihad S.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5568290/ https://www.ncbi.nlm.nih.gov/pubmed/28830409 http://dx.doi.org/10.1186/s12911-017-0518-1

_version_	1783258830353727488
author	Turner, Clayton A. Jacobs, Alexander D. Marques, Cassios K. Oates, James C. Kamen, Diane L. Anderson, Paul E. Obeid, Jihad S.
author_facet	Turner, Clayton A. Jacobs, Alexander D. Marques, Cassios K. Oates, James C. Kamen, Diane L. Anderson, Paul E. Obeid, Jihad S.
author_sort	Turner, Clayton A.
collection	PubMed
description	BACKGROUND: Identifying patients with certain clinical criteria based on manual chart review of doctors’ notes is a daunting task given the massive amounts of text notes in the electronic health records (EHR). This task can be automated using text classifiers based on Natural Language Processing (NLP) techniques along with pattern recognition machine learning (ML) algorithms. The aim of this research is to evaluate the performance of traditional classifiers for identifying patients with Systemic Lupus Erythematosus (SLE) in comparison with a newer Bayesian word vector method. METHODS: We obtained clinical notes for patients with SLE diagnosis along with controls from the Rheumatology Clinic (662 total patients). Sparse bag-of-words (BOWs) and Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs) matrices were produced using NLP pipelines. These matrices were subjected to several different NLP classifiers: neural networks, random forests, naïve Bayes, support vector machines, and Word2Vec inversion, a Bayesian inversion method. Performance was measured by calculating accuracy and area under the Receiver Operating Characteristic (ROC) curve (AUC) of a cross-validated (CV) set and a separate testing set. RESULTS: We calculated the accuracy of the ICD-9 billing codes as a baseline to be 90.00% with an AUC of 0.900, the shallow neural network with CUIs to be 92.10% with an AUC of 0.970, the random forest with BOWs to be 95.25% with an AUC of 0.994, the random forest with CUIs to be 95.00% with an AUC of 0.979, and the Word2Vec inversion to be 90.03% with an AUC of 0.905. CONCLUSIONS: Our results suggest that a shallow neural network with CUIs and random forests with both CUIs and BOWs are the best classifiers for this lupus phenotyping task. The Word2Vec inversion method failed to significantly beat the ICD-9 code classification, but yielded promising results. This method does not require explicit features and is more adaptable to non-binary classification tasks. The Word2Vec inversion is hypothesized to become more powerful with access to more data. Therefore, currently, the shallow neural networks and random forests are the desirable classifiers.
format	Online Article Text
id	pubmed-5568290
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-55682902017-08-29 Word2Vec inversion and traditional text classifiers for phenotyping lupus Turner, Clayton A. Jacobs, Alexander D. Marques, Cassios K. Oates, James C. Kamen, Diane L. Anderson, Paul E. Obeid, Jihad S. BMC Med Inform Decis Mak Research Article BACKGROUND: Identifying patients with certain clinical criteria based on manual chart review of doctors’ notes is a daunting task given the massive amounts of text notes in the electronic health records (EHR). This task can be automated using text classifiers based on Natural Language Processing (NLP) techniques along with pattern recognition machine learning (ML) algorithms. The aim of this research is to evaluate the performance of traditional classifiers for identifying patients with Systemic Lupus Erythematosus (SLE) in comparison with a newer Bayesian word vector method. METHODS: We obtained clinical notes for patients with SLE diagnosis along with controls from the Rheumatology Clinic (662 total patients). Sparse bag-of-words (BOWs) and Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs) matrices were produced using NLP pipelines. These matrices were subjected to several different NLP classifiers: neural networks, random forests, naïve Bayes, support vector machines, and Word2Vec inversion, a Bayesian inversion method. Performance was measured by calculating accuracy and area under the Receiver Operating Characteristic (ROC) curve (AUC) of a cross-validated (CV) set and a separate testing set. RESULTS: We calculated the accuracy of the ICD-9 billing codes as a baseline to be 90.00% with an AUC of 0.900, the shallow neural network with CUIs to be 92.10% with an AUC of 0.970, the random forest with BOWs to be 95.25% with an AUC of 0.994, the random forest with CUIs to be 95.00% with an AUC of 0.979, and the Word2Vec inversion to be 90.03% with an AUC of 0.905. CONCLUSIONS: Our results suggest that a shallow neural network with CUIs and random forests with both CUIs and BOWs are the best classifiers for this lupus phenotyping task. The Word2Vec inversion method failed to significantly beat the ICD-9 code classification, but yielded promising results. This method does not require explicit features and is more adaptable to non-binary classification tasks. The Word2Vec inversion is hypothesized to become more powerful with access to more data. Therefore, currently, the shallow neural networks and random forests are the desirable classifiers. BioMed Central 2017-08-22 /pmc/articles/PMC5568290/ /pubmed/28830409 http://dx.doi.org/10.1186/s12911-017-0518-1 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Turner, Clayton A. Jacobs, Alexander D. Marques, Cassios K. Oates, James C. Kamen, Diane L. Anderson, Paul E. Obeid, Jihad S. Word2Vec inversion and traditional text classifiers for phenotyping lupus
title	Word2Vec inversion and traditional text classifiers for phenotyping lupus
title_full	Word2Vec inversion and traditional text classifiers for phenotyping lupus
title_fullStr	Word2Vec inversion and traditional text classifiers for phenotyping lupus
title_full_unstemmed	Word2Vec inversion and traditional text classifiers for phenotyping lupus
title_short	Word2Vec inversion and traditional text classifiers for phenotyping lupus
title_sort	word2vec inversion and traditional text classifiers for phenotyping lupus
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5568290/ https://www.ncbi.nlm.nih.gov/pubmed/28830409 http://dx.doi.org/10.1186/s12911-017-0518-1
work_keys_str_mv	AT turnerclaytona word2vecinversionandtraditionaltextclassifiersforphenotypinglupus AT jacobsalexanderd word2vecinversionandtraditionaltextclassifiersforphenotypinglupus AT marquescassiosk word2vecinversionandtraditionaltextclassifiersforphenotypinglupus AT oatesjamesc word2vecinversionandtraditionaltextclassifiersforphenotypinglupus AT kamendianel word2vecinversionandtraditionaltextclassifiersforphenotypinglupus AT andersonpaule word2vecinversionandtraditionaltextclassifiersforphenotypinglupus AT obeidjihads word2vecinversionandtraditionaltextclassifiersforphenotypinglupus

Word2Vec inversion and traditional text classifiers for phenotyping lupus

Ejemplares similares