Cargando…

Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner

BACKGROUND: The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG...

Descripción completa

Detalles Bibliográficos
Autores principales: Renner, Robinette, Li, Shengyu, Huang, Yulong, van der Zijp-Tan, Ada Chaeli, Tan, Shaobo, Li, Dongqi, Kasukurthi, Mohan Vamsi, Benton, Ryan, Borchert, Glen M., Huang, Jingshan, Jiang, Guoqian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6927104/
https://www.ncbi.nlm.nih.gov/pubmed/31865899
http://dx.doi.org/10.1186/s12911-019-0979-5
_version_ 1783482240242548736
author Renner, Robinette
Li, Shengyu
Huang, Yulong
van der Zijp-Tan, Ada Chaeli
Tan, Shaobo
Li, Dongqi
Kasukurthi, Mohan Vamsi
Benton, Ryan
Borchert, Glen M.
Huang, Jingshan
Jiang, Guoqian
author_facet Renner, Robinette
Li, Shengyu
Huang, Yulong
van der Zijp-Tan, Ada Chaeli
Tan, Shaobo
Li, Dongqi
Kasukurthi, Mohan Vamsi
Benton, Ryan
Borchert, Glen M.
Huang, Jingshan
Jiang, Guoqian
author_sort Renner, Robinette
collection PubMed
description BACKGROUND: The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs. METHODS: In this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong. RESULTS: For CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved. DISCUSSION: Our semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples. CONCLUSIONS: Experimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality.
format Online
Article
Text
id pubmed-6927104
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69271042019-12-30 Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner Renner, Robinette Li, Shengyu Huang, Yulong van der Zijp-Tan, Ada Chaeli Tan, Shaobo Li, Dongqi Kasukurthi, Mohan Vamsi Benton, Ryan Borchert, Glen M. Huang, Jingshan Jiang, Guoqian BMC Med Inform Decis Mak Research BACKGROUND: The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs. METHODS: In this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong. RESULTS: For CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved. DISCUSSION: Our semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples. CONCLUSIONS: Experimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality. BioMed Central 2019-12-23 /pmc/articles/PMC6927104/ /pubmed/31865899 http://dx.doi.org/10.1186/s12911-019-0979-5 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Renner, Robinette
Li, Shengyu
Huang, Yulong
van der Zijp-Tan, Ada Chaeli
Tan, Shaobo
Li, Dongqi
Kasukurthi, Mohan Vamsi
Benton, Ryan
Borchert, Glen M.
Huang, Jingshan
Jiang, Guoqian
Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner
title Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner
title_full Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner
title_fullStr Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner
title_full_unstemmed Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner
title_short Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner
title_sort using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6927104/
https://www.ncbi.nlm.nih.gov/pubmed/31865899
http://dx.doi.org/10.1186/s12911-019-0979-5
work_keys_str_mv AT rennerrobinette usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner
AT lishengyu usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner
AT huangyulong usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner
AT vanderzijptanadachaeli usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner
AT tanshaobo usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner
AT lidongqi usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner
AT kasukurthimohanvamsi usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner
AT bentonryan usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner
AT borchertglenm usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner
AT huangjingshan usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner
AT jiangguoqian usinganartificialneuralnetworktomapcancercommondataelementstothebiomedicalresearchintegrateddomaingroupmodelinasemiautomatedmanner