Cargando…

The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping

This work is focused on mapping biomedical datasets to a common representation, as an integral part of data harmonization for integrated biomedical data access and sharing. We present GEM, an intelligent software assistant for automated data mapping across different datasets or from a dataset to a c...

Descripción completa

Detalles Bibliográficos
Autores principales: Ashish, Naveen, Dewan, Peehoo, Toga, Arthur W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4710756/
https://www.ncbi.nlm.nih.gov/pubmed/26793094
http://dx.doi.org/10.3389/fninf.2015.00030
_version_ 1782409856691470336
author Ashish, Naveen
Dewan, Peehoo
Toga, Arthur W.
author_facet Ashish, Naveen
Dewan, Peehoo
Toga, Arthur W.
author_sort Ashish, Naveen
collection PubMed
description This work is focused on mapping biomedical datasets to a common representation, as an integral part of data harmonization for integrated biomedical data access and sharing. We present GEM, an intelligent software assistant for automated data mapping across different datasets or from a dataset to a common data model. The GEM system automates data mapping by providing precise suggestions for data element mappings. It leverages the detailed metadata about elements in associated dataset documentation such as data dictionaries that are typically available with biomedical datasets. It employs unsupervised text mining techniques to determine similarity between data elements and also employs machine-learning classifiers to identify element matches. It further provides an active-learning capability where the process of training the GEM system is optimized. Our experimental evaluations show that the GEM system provides highly accurate data mappings (over 90% accuracy) for real datasets of thousands of data elements each, in the Alzheimer's disease research domain. Further, the effort in training the system for new datasets is also optimized. We are currently employing the GEM system to map Alzheimer's disease datasets from around the globe into a common representation, as part of a global Alzheimer's disease integrated data sharing and analysis network called GAAIN. GEM achieves significantly higher data mapping accuracy for biomedical datasets compared to other state-of-the-art tools for database schema matching that have similar functionality. With the use of active-learning capabilities, the user effort in training the system is minimal.
format Online
Article
Text
id pubmed-4710756
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-47107562016-01-20 The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping Ashish, Naveen Dewan, Peehoo Toga, Arthur W. Front Neuroinform Neuroscience This work is focused on mapping biomedical datasets to a common representation, as an integral part of data harmonization for integrated biomedical data access and sharing. We present GEM, an intelligent software assistant for automated data mapping across different datasets or from a dataset to a common data model. The GEM system automates data mapping by providing precise suggestions for data element mappings. It leverages the detailed metadata about elements in associated dataset documentation such as data dictionaries that are typically available with biomedical datasets. It employs unsupervised text mining techniques to determine similarity between data elements and also employs machine-learning classifiers to identify element matches. It further provides an active-learning capability where the process of training the GEM system is optimized. Our experimental evaluations show that the GEM system provides highly accurate data mappings (over 90% accuracy) for real datasets of thousands of data elements each, in the Alzheimer's disease research domain. Further, the effort in training the system for new datasets is also optimized. We are currently employing the GEM system to map Alzheimer's disease datasets from around the globe into a common representation, as part of a global Alzheimer's disease integrated data sharing and analysis network called GAAIN. GEM achieves significantly higher data mapping accuracy for biomedical datasets compared to other state-of-the-art tools for database schema matching that have similar functionality. With the use of active-learning capabilities, the user effort in training the system is minimal. Frontiers Media S.A. 2016-01-13 /pmc/articles/PMC4710756/ /pubmed/26793094 http://dx.doi.org/10.3389/fninf.2015.00030 Text en Copyright © 2016 Ashish, Dewan and Toga. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Ashish, Naveen
Dewan, Peehoo
Toga, Arthur W.
The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping
title The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping
title_full The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping
title_fullStr The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping
title_full_unstemmed The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping
title_short The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping
title_sort gaain entity mapper: an active-learning system for medical data mapping
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4710756/
https://www.ncbi.nlm.nih.gov/pubmed/26793094
http://dx.doi.org/10.3389/fninf.2015.00030
work_keys_str_mv AT ashishnaveen thegaainentitymapperanactivelearningsystemformedicaldatamapping
AT dewanpeehoo thegaainentitymapperanactivelearningsystemformedicaldatamapping
AT togaarthurw thegaainentitymapperanactivelearningsystemformedicaldatamapping
AT ashishnaveen gaainentitymapperanactivelearningsystemformedicaldatamapping
AT dewanpeehoo gaainentitymapperanactivelearningsystemformedicaldatamapping
AT togaarthurw gaainentitymapperanactivelearningsystemformedicaldatamapping