Cargando…

A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph

The knowledge graph is one of the essential infrastructures of artificial intelligence. It is a challenge for knowledge engineering to construct a high-quality domain knowledge graph for multi-source heterogeneous data. We propose a complete process framework for constructing a knowledge graph that...

Descripción completa

Detalles Bibliográficos
Autores principales: Yan, Chenwei, Fang, Xinyue, Huang, Xiaotong, Guo, Chenyi, Wu, Ji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10569599/
https://www.ncbi.nlm.nih.gov/pubmed/37841897
http://dx.doi.org/10.3389/fdata.2023.1278153
_version_ 1785119581134651392
author Yan, Chenwei
Fang, Xinyue
Huang, Xiaotong
Guo, Chenyi
Wu, Ji
author_facet Yan, Chenwei
Fang, Xinyue
Huang, Xiaotong
Guo, Chenyi
Wu, Ji
author_sort Yan, Chenwei
collection PubMed
description The knowledge graph is one of the essential infrastructures of artificial intelligence. It is a challenge for knowledge engineering to construct a high-quality domain knowledge graph for multi-source heterogeneous data. We propose a complete process framework for constructing a knowledge graph that combines structured data and unstructured data, which includes data processing, information extraction, knowledge fusion, data storage, and update strategies, aiming to improve the quality of the knowledge graph and extend its life cycle. Specifically, we take the construction process of an enterprise knowledge graph as an example and integrate enterprise register information, litigation-related information, and enterprise announcement information to enrich the enterprise knowledge graph. For the unstructured text, we improve existing model to extract triples and the F1-score of our model reached 72.77%. The number of nodes and edges in our constructed enterprise knowledge graph reaches 1,430,000 and 3,170,000, respectively. Furthermore, for each type of multi-source heterogeneous data, we apply corresponding methods and strategies for information extraction and data storage and carry out a detailed comparative analysis of graph databases. From the perspective of practical use, the informative enterprise knowledge graph and its timely update can serve many actual business needs. Our proposed enterprise knowledge graph has been deployed in HuaRong RongTong (Beijing) Technology Co., Ltd. and is used by the staff as a powerful tool for corporate due diligence. The key features are reported and analyzed in the case study. Overall, this paper provides an easy-to-follow solution and practice for domain knowledge graph construction, as well as demonstrating its application in corporate due diligence.
format Online
Article
Text
id pubmed-10569599
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-105695992023-10-13 A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph Yan, Chenwei Fang, Xinyue Huang, Xiaotong Guo, Chenyi Wu, Ji Front Big Data Big Data The knowledge graph is one of the essential infrastructures of artificial intelligence. It is a challenge for knowledge engineering to construct a high-quality domain knowledge graph for multi-source heterogeneous data. We propose a complete process framework for constructing a knowledge graph that combines structured data and unstructured data, which includes data processing, information extraction, knowledge fusion, data storage, and update strategies, aiming to improve the quality of the knowledge graph and extend its life cycle. Specifically, we take the construction process of an enterprise knowledge graph as an example and integrate enterprise register information, litigation-related information, and enterprise announcement information to enrich the enterprise knowledge graph. For the unstructured text, we improve existing model to extract triples and the F1-score of our model reached 72.77%. The number of nodes and edges in our constructed enterprise knowledge graph reaches 1,430,000 and 3,170,000, respectively. Furthermore, for each type of multi-source heterogeneous data, we apply corresponding methods and strategies for information extraction and data storage and carry out a detailed comparative analysis of graph databases. From the perspective of practical use, the informative enterprise knowledge graph and its timely update can serve many actual business needs. Our proposed enterprise knowledge graph has been deployed in HuaRong RongTong (Beijing) Technology Co., Ltd. and is used by the staff as a powerful tool for corporate due diligence. The key features are reported and analyzed in the case study. Overall, this paper provides an easy-to-follow solution and practice for domain knowledge graph construction, as well as demonstrating its application in corporate due diligence. Frontiers Media S.A. 2023-09-28 /pmc/articles/PMC10569599/ /pubmed/37841897 http://dx.doi.org/10.3389/fdata.2023.1278153 Text en Copyright © 2023 Yan, Fang, Huang, Guo and Wu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Big Data
Yan, Chenwei
Fang, Xinyue
Huang, Xiaotong
Guo, Chenyi
Wu, Ji
A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph
title A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph
title_full A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph
title_fullStr A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph
title_full_unstemmed A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph
title_short A solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph
title_sort solution and practice for combining multi-source heterogeneous data to construct enterprise knowledge graph
topic Big Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10569599/
https://www.ncbi.nlm.nih.gov/pubmed/37841897
http://dx.doi.org/10.3389/fdata.2023.1278153
work_keys_str_mv AT yanchenwei asolutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT fangxinyue asolutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT huangxiaotong asolutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT guochenyi asolutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT wuji asolutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT yanchenwei solutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT fangxinyue solutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT huangxiaotong solutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT guochenyi solutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph
AT wuji solutionandpracticeforcombiningmultisourceheterogeneousdatatoconstructenterpriseknowledgegraph