Cargando…

A review of data abstraction

It is well-known that Artificial Intelligence (AI), and in particular Machine Learning (ML), is not effective without good data preparation, as also pointed out by the recent wave of data-centric AI. Data preparation is the process of gathering, transforming and cleaning raw data prior to processing...

Descripción completa

Detalles Bibliográficos
Autores principales: Cima, Gianluca, Console, Marco, Lenzerini, Maurizio, Poggi, Antonella
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10328546/
https://www.ncbi.nlm.nih.gov/pubmed/37426303
http://dx.doi.org/10.3389/frai.2023.1085754
_version_ 1785069822217814016
author Cima, Gianluca
Console, Marco
Lenzerini, Maurizio
Poggi, Antonella
author_facet Cima, Gianluca
Console, Marco
Lenzerini, Maurizio
Poggi, Antonella
author_sort Cima, Gianluca
collection PubMed
description It is well-known that Artificial Intelligence (AI), and in particular Machine Learning (ML), is not effective without good data preparation, as also pointed out by the recent wave of data-centric AI. Data preparation is the process of gathering, transforming and cleaning raw data prior to processing and analysis. Since nowadays data often reside in distributed and heterogeneous data sources, the first activity of data preparation requires collecting data from suitable data sources and data services, often distributed and heterogeneous. It is thus essential that providers describe their data services in a way to make them compliant with the FAIR guiding principles, i.e., make them automatically Findable, Accessible, Interoperable, and Reusable (FAIR). The notion of data abstraction has been introduced exactly to meet this need. Abstraction is a kind of reverse engineering task that automatically provides a semantic characterization of a data service made available by a provider. The goal of this paper is to review the results obtained so far in data abstraction, by presenting the formal framework for its definition, reporting about the decidability and complexity of the main theoretical problems concerning abstraction, and discuss open issues and interesting directions for future research.
format Online
Article
Text
id pubmed-10328546
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-103285462023-07-08 A review of data abstraction Cima, Gianluca Console, Marco Lenzerini, Maurizio Poggi, Antonella Front Artif Intell Artificial Intelligence It is well-known that Artificial Intelligence (AI), and in particular Machine Learning (ML), is not effective without good data preparation, as also pointed out by the recent wave of data-centric AI. Data preparation is the process of gathering, transforming and cleaning raw data prior to processing and analysis. Since nowadays data often reside in distributed and heterogeneous data sources, the first activity of data preparation requires collecting data from suitable data sources and data services, often distributed and heterogeneous. It is thus essential that providers describe their data services in a way to make them compliant with the FAIR guiding principles, i.e., make them automatically Findable, Accessible, Interoperable, and Reusable (FAIR). The notion of data abstraction has been introduced exactly to meet this need. Abstraction is a kind of reverse engineering task that automatically provides a semantic characterization of a data service made available by a provider. The goal of this paper is to review the results obtained so far in data abstraction, by presenting the formal framework for its definition, reporting about the decidability and complexity of the main theoretical problems concerning abstraction, and discuss open issues and interesting directions for future research. Frontiers Media S.A. 2023-06-23 /pmc/articles/PMC10328546/ /pubmed/37426303 http://dx.doi.org/10.3389/frai.2023.1085754 Text en Copyright © 2023 Cima, Console, Lenzerini and Poggi. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Cima, Gianluca
Console, Marco
Lenzerini, Maurizio
Poggi, Antonella
A review of data abstraction
title A review of data abstraction
title_full A review of data abstraction
title_fullStr A review of data abstraction
title_full_unstemmed A review of data abstraction
title_short A review of data abstraction
title_sort review of data abstraction
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10328546/
https://www.ncbi.nlm.nih.gov/pubmed/37426303
http://dx.doi.org/10.3389/frai.2023.1085754
work_keys_str_mv AT cimagianluca areviewofdataabstraction
AT consolemarco areviewofdataabstraction
AT lenzerinimaurizio areviewofdataabstraction
AT poggiantonella areviewofdataabstraction
AT cimagianluca reviewofdataabstraction
AT consolemarco reviewofdataabstraction
AT lenzerinimaurizio reviewofdataabstraction
AT poggiantonella reviewofdataabstraction