Cargando…

An approach for semantic integration of heterogeneous data sources

Integrating data from multiple heterogeneous data sources entails dealing with data distributed among heterogeneous information sources, which can be structured, semi-structured or unstructured, and providing the user with a unified view of these data. Thus, in general, gathering information is chal...

Descripción completa

Detalles Bibliográficos
Autores principales: Fusco, Giuseppe, Aversano, Lerina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924686/
https://www.ncbi.nlm.nih.gov/pubmed/33816906
http://dx.doi.org/10.7717/peerj-cs.254
_version_ 1783659141199298560
author Fusco, Giuseppe
Aversano, Lerina
author_facet Fusco, Giuseppe
Aversano, Lerina
author_sort Fusco, Giuseppe
collection PubMed
description Integrating data from multiple heterogeneous data sources entails dealing with data distributed among heterogeneous information sources, which can be structured, semi-structured or unstructured, and providing the user with a unified view of these data. Thus, in general, gathering information is challenging, and one of the main reasons is that data sources are designed to support specific applications. Very often their structure is unknown to the large part of users. Moreover, the stored data is often redundant, mixed with information only needed to support enterprise processes, and incomplete with respect to the business domain. Collecting, integrating, reconciling and efficiently extracting information from heterogeneous and autonomous data sources is regarded as a major challenge. In this paper, we present an approach for the semantic integration of heterogeneous data sources, DIF (Data Integration Framework), and a software prototype to support all aspects of a complex data integration process. The proposed approach is an ontology-based generalization of both Global-as-View and Local-as-View approaches. In particular, to overcome problems due to semantic heterogeneity and to support interoperability with external systems, ontologies are used as a conceptual schema to represent both data sources to be integrated and the global view.
format Online
Article
Text
id pubmed-7924686
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79246862021-04-02 An approach for semantic integration of heterogeneous data sources Fusco, Giuseppe Aversano, Lerina PeerJ Comput Sci Data Science Integrating data from multiple heterogeneous data sources entails dealing with data distributed among heterogeneous information sources, which can be structured, semi-structured or unstructured, and providing the user with a unified view of these data. Thus, in general, gathering information is challenging, and one of the main reasons is that data sources are designed to support specific applications. Very often their structure is unknown to the large part of users. Moreover, the stored data is often redundant, mixed with information only needed to support enterprise processes, and incomplete with respect to the business domain. Collecting, integrating, reconciling and efficiently extracting information from heterogeneous and autonomous data sources is regarded as a major challenge. In this paper, we present an approach for the semantic integration of heterogeneous data sources, DIF (Data Integration Framework), and a software prototype to support all aspects of a complex data integration process. The proposed approach is an ontology-based generalization of both Global-as-View and Local-as-View approaches. In particular, to overcome problems due to semantic heterogeneity and to support interoperability with external systems, ontologies are used as a conceptual schema to represent both data sources to be integrated and the global view. PeerJ Inc. 2020-03-02 /pmc/articles/PMC7924686/ /pubmed/33816906 http://dx.doi.org/10.7717/peerj-cs.254 Text en ©2020 Fusco and Aversano https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Data Science
Fusco, Giuseppe
Aversano, Lerina
An approach for semantic integration of heterogeneous data sources
title An approach for semantic integration of heterogeneous data sources
title_full An approach for semantic integration of heterogeneous data sources
title_fullStr An approach for semantic integration of heterogeneous data sources
title_full_unstemmed An approach for semantic integration of heterogeneous data sources
title_short An approach for semantic integration of heterogeneous data sources
title_sort approach for semantic integration of heterogeneous data sources
topic Data Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924686/
https://www.ncbi.nlm.nih.gov/pubmed/33816906
http://dx.doi.org/10.7717/peerj-cs.254
work_keys_str_mv AT fuscogiuseppe anapproachforsemanticintegrationofheterogeneousdatasources
AT aversanolerina anapproachforsemanticintegrationofheterogeneousdatasources
AT fuscogiuseppe approachforsemanticintegrationofheterogeneousdatasources
AT aversanolerina approachforsemanticintegrationofheterogeneousdatasources