Cargando…
An approach for semantic integration of heterogeneous data sources
Integrating data from multiple heterogeneous data sources entails dealing with data distributed among heterogeneous information sources, which can be structured, semi-structured or unstructured, and providing the user with a unified view of these data. Thus, in general, gathering information is chal...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924686/ https://www.ncbi.nlm.nih.gov/pubmed/33816906 http://dx.doi.org/10.7717/peerj-cs.254 |
_version_ | 1783659141199298560 |
---|---|
author | Fusco, Giuseppe Aversano, Lerina |
author_facet | Fusco, Giuseppe Aversano, Lerina |
author_sort | Fusco, Giuseppe |
collection | PubMed |
description | Integrating data from multiple heterogeneous data sources entails dealing with data distributed among heterogeneous information sources, which can be structured, semi-structured or unstructured, and providing the user with a unified view of these data. Thus, in general, gathering information is challenging, and one of the main reasons is that data sources are designed to support specific applications. Very often their structure is unknown to the large part of users. Moreover, the stored data is often redundant, mixed with information only needed to support enterprise processes, and incomplete with respect to the business domain. Collecting, integrating, reconciling and efficiently extracting information from heterogeneous and autonomous data sources is regarded as a major challenge. In this paper, we present an approach for the semantic integration of heterogeneous data sources, DIF (Data Integration Framework), and a software prototype to support all aspects of a complex data integration process. The proposed approach is an ontology-based generalization of both Global-as-View and Local-as-View approaches. In particular, to overcome problems due to semantic heterogeneity and to support interoperability with external systems, ontologies are used as a conceptual schema to represent both data sources to be integrated and the global view. |
format | Online Article Text |
id | pubmed-7924686 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-79246862021-04-02 An approach for semantic integration of heterogeneous data sources Fusco, Giuseppe Aversano, Lerina PeerJ Comput Sci Data Science Integrating data from multiple heterogeneous data sources entails dealing with data distributed among heterogeneous information sources, which can be structured, semi-structured or unstructured, and providing the user with a unified view of these data. Thus, in general, gathering information is challenging, and one of the main reasons is that data sources are designed to support specific applications. Very often their structure is unknown to the large part of users. Moreover, the stored data is often redundant, mixed with information only needed to support enterprise processes, and incomplete with respect to the business domain. Collecting, integrating, reconciling and efficiently extracting information from heterogeneous and autonomous data sources is regarded as a major challenge. In this paper, we present an approach for the semantic integration of heterogeneous data sources, DIF (Data Integration Framework), and a software prototype to support all aspects of a complex data integration process. The proposed approach is an ontology-based generalization of both Global-as-View and Local-as-View approaches. In particular, to overcome problems due to semantic heterogeneity and to support interoperability with external systems, ontologies are used as a conceptual schema to represent both data sources to be integrated and the global view. PeerJ Inc. 2020-03-02 /pmc/articles/PMC7924686/ /pubmed/33816906 http://dx.doi.org/10.7717/peerj-cs.254 Text en ©2020 Fusco and Aversano https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Data Science Fusco, Giuseppe Aversano, Lerina An approach for semantic integration of heterogeneous data sources |
title | An approach for semantic integration of heterogeneous data sources |
title_full | An approach for semantic integration of heterogeneous data sources |
title_fullStr | An approach for semantic integration of heterogeneous data sources |
title_full_unstemmed | An approach for semantic integration of heterogeneous data sources |
title_short | An approach for semantic integration of heterogeneous data sources |
title_sort | approach for semantic integration of heterogeneous data sources |
topic | Data Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924686/ https://www.ncbi.nlm.nih.gov/pubmed/33816906 http://dx.doi.org/10.7717/peerj-cs.254 |
work_keys_str_mv | AT fuscogiuseppe anapproachforsemanticintegrationofheterogeneousdatasources AT aversanolerina anapproachforsemanticintegrationofheterogeneousdatasources AT fuscogiuseppe approachforsemanticintegrationofheterogeneousdatasources AT aversanolerina approachforsemanticintegrationofheterogeneousdatasources |