Cargando…

LOPDF: a framework for extracting and producing open data of scientific documents for smart digital libraries

BACKGROUND: Results of scientific experiments and research work, either conducted by individuals or organizations, are published and shared with scientific community in different types of scientific publications such as books, chapters, journals, articles, reference works and reference works entries...

Descripción completa

Detalles Bibliográficos
Autor principal:	Aslam, Muhammad Ahtisham
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2021
Materias:	Algorithms and Analysis of Algorithms
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8049134/ https://www.ncbi.nlm.nih.gov/pubmed/33954235 http://dx.doi.org/10.7717/peerj-cs.445

_version_	1783679370143989760
author	Aslam, Muhammad Ahtisham
author_facet	Aslam, Muhammad Ahtisham
author_sort	Aslam, Muhammad Ahtisham
collection	PubMed
description	BACKGROUND: Results of scientific experiments and research work, either conducted by individuals or organizations, are published and shared with scientific community in different types of scientific publications such as books, chapters, journals, articles, reference works and reference works entries. One aspect of these documents is their contents and the other is metadata. Metadata of scientific documents could be used to increase mutual cooperation, find people with common interest and research work, and to find scientific documents in the matching domains. The major issue in getting these benefits from metadata of scientific publications is availability of these data in unstructured (or semi-structured) format so that it can not be used to ask smart queries that can help in computing and performing different types of analysis on scientific publications data. Also, acquisition and smart processing of publications data is a complicated as well as time and resource consuming task. METHODS: To address this problem we have developed a generic framework named as Linked Open Publications Data Framework (LOPDF). The LOPDF framework can be used to crawl, process, extract and produce machine understandable data (i.e., LOD) about scientific publications from different publisher specific sources such as portals, XML export and websites. In this paper we present the architecture, process and algorithm that we developed to process textual publications data and to produce semantically enriched data as RDF datasets (i.e., open data). RESULTS: The resulting datasets can be used to make smart queries by making use of SPARQL protocol. We also present the quantitative as well as qualitative analysis of our resulting datasets which ultimately can be used to compute the research behavior of organizations in rapidly growing knowledge society. Finally, we present the potential usage of producing and processing such open data of scientific publications and how results of performing smart queries on resulting open datasets can be used to compute the impact and perform different types of analysis on scientific publications data.
format	Online Article Text
id	pubmed-8049134
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-80491342021-05-04 LOPDF: a framework for extracting and producing open data of scientific documents for smart digital libraries Aslam, Muhammad Ahtisham PeerJ Comput Sci Algorithms and Analysis of Algorithms BACKGROUND: Results of scientific experiments and research work, either conducted by individuals or organizations, are published and shared with scientific community in different types of scientific publications such as books, chapters, journals, articles, reference works and reference works entries. One aspect of these documents is their contents and the other is metadata. Metadata of scientific documents could be used to increase mutual cooperation, find people with common interest and research work, and to find scientific documents in the matching domains. The major issue in getting these benefits from metadata of scientific publications is availability of these data in unstructured (or semi-structured) format so that it can not be used to ask smart queries that can help in computing and performing different types of analysis on scientific publications data. Also, acquisition and smart processing of publications data is a complicated as well as time and resource consuming task. METHODS: To address this problem we have developed a generic framework named as Linked Open Publications Data Framework (LOPDF). The LOPDF framework can be used to crawl, process, extract and produce machine understandable data (i.e., LOD) about scientific publications from different publisher specific sources such as portals, XML export and websites. In this paper we present the architecture, process and algorithm that we developed to process textual publications data and to produce semantically enriched data as RDF datasets (i.e., open data). RESULTS: The resulting datasets can be used to make smart queries by making use of SPARQL protocol. We also present the quantitative as well as qualitative analysis of our resulting datasets which ultimately can be used to compute the research behavior of organizations in rapidly growing knowledge society. Finally, we present the potential usage of producing and processing such open data of scientific publications and how results of performing smart queries on resulting open datasets can be used to compute the impact and perform different types of analysis on scientific publications data. PeerJ Inc. 2021-04-07 /pmc/articles/PMC8049134/ /pubmed/33954235 http://dx.doi.org/10.7717/peerj-cs.445 Text en ©2021 Aslam https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Algorithms and Analysis of Algorithms Aslam, Muhammad Ahtisham LOPDF: a framework for extracting and producing open data of scientific documents for smart digital libraries
title	LOPDF: a framework for extracting and producing open data of scientific documents for smart digital libraries
title_full	LOPDF: a framework for extracting and producing open data of scientific documents for smart digital libraries
title_fullStr	LOPDF: a framework for extracting and producing open data of scientific documents for smart digital libraries
title_full_unstemmed	LOPDF: a framework for extracting and producing open data of scientific documents for smart digital libraries
title_short	LOPDF: a framework for extracting and producing open data of scientific documents for smart digital libraries
title_sort	lopdf: a framework for extracting and producing open data of scientific documents for smart digital libraries
topic	Algorithms and Analysis of Algorithms
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8049134/ https://www.ncbi.nlm.nih.gov/pubmed/33954235 http://dx.doi.org/10.7717/peerj-cs.445
work_keys_str_mv	AT aslammuhammadahtisham lopdfaframeworkforextractingandproducingopendataofscientificdocumentsforsmartdigitallibraries

LOPDF: a framework for extracting and producing open data of scientific documents for smart digital libraries

Ejemplares similares