Cargando…
Automatic transparency evaluation for open knowledge extraction systems
BACKGROUND: This paper proposes Cyrus, a new transparency evaluation framework, for Open Knowledge Extraction (OKE) systems. Cyrus is based on the state-of-the-art transparency models and linked data quality assessment dimensions. It brings together a comprehensive view of transparency dimensions fo...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10468861/ https://www.ncbi.nlm.nih.gov/pubmed/37653549 http://dx.doi.org/10.1186/s13326-023-00293-9 |
_version_ | 1785099316656865280 |
---|---|
author | Basereh, Maryam Caputo, Annalina Brennan, Rob |
author_facet | Basereh, Maryam Caputo, Annalina Brennan, Rob |
author_sort | Basereh, Maryam |
collection | PubMed |
description | BACKGROUND: This paper proposes Cyrus, a new transparency evaluation framework, for Open Knowledge Extraction (OKE) systems. Cyrus is based on the state-of-the-art transparency models and linked data quality assessment dimensions. It brings together a comprehensive view of transparency dimensions for OKE systems. The Cyrus framework is used to evaluate the transparency of three linked datasets, which are built from the same corpus by three state-of-the-art OKE systems. The evaluation is automatically performed using a combination of three state-of-the-art FAIRness (Findability, Accessibility, Interoperability, Reusability) assessment tools and a linked data quality evaluation framework, called Luzzu. This evaluation includes six Cyrus data transparency dimensions for which existing assessment tools could be identified. OKE systems extract structured knowledge from unstructured or semi-structured text in the form of linked data. These systems are fundamental components of advanced knowledge services. However, due to the lack of a transparency framework for OKE, most OKE systems are not transparent. This means that their processes and outcomes are not understandable and interpretable. A comprehensive framework sheds light on different aspects of transparency, allows comparison between the transparency of different systems by supporting the development of transparency scores, gives insight into the transparency weaknesses of the system, and ways to improve them. Automatic transparency evaluation helps with scalability and facilitates transparency assessment. The transparency problem has been identified as critical by the European Union Trustworthy Artificial Intelligence (AI) guidelines. In this paper, Cyrus provides the first comprehensive view of transparency dimensions for OKE systems by merging the perspectives of the FAccT (Fairness, Accountability, and Transparency), FAIR, and linked data quality research communities. RESULTS: In Cyrus, data transparency includes ten dimensions which are grouped in two categories. In this paper, six of these dimensions, i.e., provenance, interpretability, understandability, licensing, availability, interlinking have been evaluated automatically for three state-of-the-art OKE systems, using the state-of-the-art metrics and tools. Covid-on-the-Web is identified to have the highest mean transparency. CONCLUSIONS: This is the first research to study the transparency of OKE systems that provides a comprehensive set of transparency dimensions spanning ethics, trustworthy AI, and data quality approaches to transparency. It also demonstrates how to perform automated transparency evaluation that combines existing FAIRness and linked data quality assessment tools for the first time. We show that state-of-the-art OKE systems vary in the transparency of the linked data generated and that these differences can be automatically quantified leading to potential applications in trustworthy AI, compliance, data protection, data governance, and future OKE system design and testing. |
format | Online Article Text |
id | pubmed-10468861 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-104688612023-09-01 Automatic transparency evaluation for open knowledge extraction systems Basereh, Maryam Caputo, Annalina Brennan, Rob J Biomed Semantics Research BACKGROUND: This paper proposes Cyrus, a new transparency evaluation framework, for Open Knowledge Extraction (OKE) systems. Cyrus is based on the state-of-the-art transparency models and linked data quality assessment dimensions. It brings together a comprehensive view of transparency dimensions for OKE systems. The Cyrus framework is used to evaluate the transparency of three linked datasets, which are built from the same corpus by three state-of-the-art OKE systems. The evaluation is automatically performed using a combination of three state-of-the-art FAIRness (Findability, Accessibility, Interoperability, Reusability) assessment tools and a linked data quality evaluation framework, called Luzzu. This evaluation includes six Cyrus data transparency dimensions for which existing assessment tools could be identified. OKE systems extract structured knowledge from unstructured or semi-structured text in the form of linked data. These systems are fundamental components of advanced knowledge services. However, due to the lack of a transparency framework for OKE, most OKE systems are not transparent. This means that their processes and outcomes are not understandable and interpretable. A comprehensive framework sheds light on different aspects of transparency, allows comparison between the transparency of different systems by supporting the development of transparency scores, gives insight into the transparency weaknesses of the system, and ways to improve them. Automatic transparency evaluation helps with scalability and facilitates transparency assessment. The transparency problem has been identified as critical by the European Union Trustworthy Artificial Intelligence (AI) guidelines. In this paper, Cyrus provides the first comprehensive view of transparency dimensions for OKE systems by merging the perspectives of the FAccT (Fairness, Accountability, and Transparency), FAIR, and linked data quality research communities. RESULTS: In Cyrus, data transparency includes ten dimensions which are grouped in two categories. In this paper, six of these dimensions, i.e., provenance, interpretability, understandability, licensing, availability, interlinking have been evaluated automatically for three state-of-the-art OKE systems, using the state-of-the-art metrics and tools. Covid-on-the-Web is identified to have the highest mean transparency. CONCLUSIONS: This is the first research to study the transparency of OKE systems that provides a comprehensive set of transparency dimensions spanning ethics, trustworthy AI, and data quality approaches to transparency. It also demonstrates how to perform automated transparency evaluation that combines existing FAIRness and linked data quality assessment tools for the first time. We show that state-of-the-art OKE systems vary in the transparency of the linked data generated and that these differences can be automatically quantified leading to potential applications in trustworthy AI, compliance, data protection, data governance, and future OKE system design and testing. BioMed Central 2023-08-31 /pmc/articles/PMC10468861/ /pubmed/37653549 http://dx.doi.org/10.1186/s13326-023-00293-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Basereh, Maryam Caputo, Annalina Brennan, Rob Automatic transparency evaluation for open knowledge extraction systems |
title | Automatic transparency evaluation for open knowledge extraction systems |
title_full | Automatic transparency evaluation for open knowledge extraction systems |
title_fullStr | Automatic transparency evaluation for open knowledge extraction systems |
title_full_unstemmed | Automatic transparency evaluation for open knowledge extraction systems |
title_short | Automatic transparency evaluation for open knowledge extraction systems |
title_sort | automatic transparency evaluation for open knowledge extraction systems |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10468861/ https://www.ncbi.nlm.nih.gov/pubmed/37653549 http://dx.doi.org/10.1186/s13326-023-00293-9 |
work_keys_str_mv | AT baserehmaryam automatictransparencyevaluationforopenknowledgeextractionsystems AT caputoannalina automatictransparencyevaluationforopenknowledgeextractionsystems AT brennanrob automatictransparencyevaluationforopenknowledgeextractionsystems |