Cargando…

Ontology-based venous thromboembolism risk assessment model developing from medical records

BACKGROUND: Padua linear model is widely used for the risk assessment of venous thromboembolism (VTE), a common but preventable complication for inpatients. However, genetic and environmental differences between Western and Chinese population limit the validity of Padua model in Chinese patients. Me...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Yuqing, Wang, Xin, Huang, Yu, Chen, Ning, Shi, Juhong, Chen, Ting
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6686216/
https://www.ncbi.nlm.nih.gov/pubmed/31391095
http://dx.doi.org/10.1186/s12911-019-0856-2
_version_ 1783442514273894400
author Yang, Yuqing
Wang, Xin
Huang, Yu
Chen, Ning
Shi, Juhong
Chen, Ting
author_facet Yang, Yuqing
Wang, Xin
Huang, Yu
Chen, Ning
Shi, Juhong
Chen, Ting
author_sort Yang, Yuqing
collection PubMed
description BACKGROUND: Padua linear model is widely used for the risk assessment of venous thromboembolism (VTE), a common but preventable complication for inpatients. However, genetic and environmental differences between Western and Chinese population limit the validity of Padua model in Chinese patients. Medical records which contain rich information about disease progression, are useful in mining new risk factors related to Chinese VTE patients. Furthermore, machine learning (ML) methods provide new opportunities to build precise risk prediction model by automatic selection of risk factors based on original medical records. METHODS: Medical records of 3,106 inpatients including 224 VTE patients were collected and various types of ontologies were integrated to parse unstructured text. A workflow of ontology-based VTE risk prediction model, that combines natural language processing (NLP) and machine learning (ML) technologies, was proposed. Firstly ontology terms were extracted from medical records, then sorted according to their calculated weights. Next importance of each term in the unit of section was evaluated and finally a ML model was built based on a subset of terms. Four ML methods were tested, and the best model was decided by comparing area under the receiver operating characteristic curve (AUROC). RESULTS: Medical records were first split into different sections and subsequently, terms from each section were sorted by their weights calculated by multiple types of information. Greedy selection algorithm was used to obtain significant sections and terms. Top terms in each section were selected to construct patients’ distributed representations by word embedding technique. Using top 300 terms of two important sections, namely the ‘Progress Note’ section and ‘Admitting Diagnosis’ section, the model showed relatively better predictive performance. Then ML model which utilizes a subset of terms from two sections, about 110 terms, achieved the best AUC score, of 0.973 ± 0.006, which was significantly better compared to the Padua’s performance of 0.791 ± 0.022. Terms found by the model showed their potential to help clinicians explore new risk factors. CONCLUSIONS: In this study, a new VTE risk assessment model based on ontologies extraction from raw medical records is developed and its performance is verified on real clinical dataset. Results of selected terms can help clinicians to discover meaningful risk factors.
format Online
Article
Text
id pubmed-6686216
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66862162019-08-12 Ontology-based venous thromboembolism risk assessment model developing from medical records Yang, Yuqing Wang, Xin Huang, Yu Chen, Ning Shi, Juhong Chen, Ting BMC Med Inform Decis Mak Research BACKGROUND: Padua linear model is widely used for the risk assessment of venous thromboembolism (VTE), a common but preventable complication for inpatients. However, genetic and environmental differences between Western and Chinese population limit the validity of Padua model in Chinese patients. Medical records which contain rich information about disease progression, are useful in mining new risk factors related to Chinese VTE patients. Furthermore, machine learning (ML) methods provide new opportunities to build precise risk prediction model by automatic selection of risk factors based on original medical records. METHODS: Medical records of 3,106 inpatients including 224 VTE patients were collected and various types of ontologies were integrated to parse unstructured text. A workflow of ontology-based VTE risk prediction model, that combines natural language processing (NLP) and machine learning (ML) technologies, was proposed. Firstly ontology terms were extracted from medical records, then sorted according to their calculated weights. Next importance of each term in the unit of section was evaluated and finally a ML model was built based on a subset of terms. Four ML methods were tested, and the best model was decided by comparing area under the receiver operating characteristic curve (AUROC). RESULTS: Medical records were first split into different sections and subsequently, terms from each section were sorted by their weights calculated by multiple types of information. Greedy selection algorithm was used to obtain significant sections and terms. Top terms in each section were selected to construct patients’ distributed representations by word embedding technique. Using top 300 terms of two important sections, namely the ‘Progress Note’ section and ‘Admitting Diagnosis’ section, the model showed relatively better predictive performance. Then ML model which utilizes a subset of terms from two sections, about 110 terms, achieved the best AUC score, of 0.973 ± 0.006, which was significantly better compared to the Padua’s performance of 0.791 ± 0.022. Terms found by the model showed their potential to help clinicians explore new risk factors. CONCLUSIONS: In this study, a new VTE risk assessment model based on ontologies extraction from raw medical records is developed and its performance is verified on real clinical dataset. Results of selected terms can help clinicians to discover meaningful risk factors. BioMed Central 2019-08-08 /pmc/articles/PMC6686216/ /pubmed/31391095 http://dx.doi.org/10.1186/s12911-019-0856-2 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Yang, Yuqing
Wang, Xin
Huang, Yu
Chen, Ning
Shi, Juhong
Chen, Ting
Ontology-based venous thromboembolism risk assessment model developing from medical records
title Ontology-based venous thromboembolism risk assessment model developing from medical records
title_full Ontology-based venous thromboembolism risk assessment model developing from medical records
title_fullStr Ontology-based venous thromboembolism risk assessment model developing from medical records
title_full_unstemmed Ontology-based venous thromboembolism risk assessment model developing from medical records
title_short Ontology-based venous thromboembolism risk assessment model developing from medical records
title_sort ontology-based venous thromboembolism risk assessment model developing from medical records
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6686216/
https://www.ncbi.nlm.nih.gov/pubmed/31391095
http://dx.doi.org/10.1186/s12911-019-0856-2
work_keys_str_mv AT yangyuqing ontologybasedvenousthromboembolismriskassessmentmodeldevelopingfrommedicalrecords
AT wangxin ontologybasedvenousthromboembolismriskassessmentmodeldevelopingfrommedicalrecords
AT huangyu ontologybasedvenousthromboembolismriskassessmentmodeldevelopingfrommedicalrecords
AT chenning ontologybasedvenousthromboembolismriskassessmentmodeldevelopingfrommedicalrecords
AT shijuhong ontologybasedvenousthromboembolismriskassessmentmodeldevelopingfrommedicalrecords
AT chenting ontologybasedvenousthromboembolismriskassessmentmodeldevelopingfrommedicalrecords