Cargando…
A Quantum Genetic Algorithm for Building a Semantic Textual Similarity Estimation Framework for Plagiarism Detection Applications
The majority of the recent research on text similarity has been focused on machine learning strategies to combat the problem in the educational environment. When the originality of an idea is copied, it increases the difficulty of using a plagiarism detection system in practice, and the system fails...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10530057/ https://www.ncbi.nlm.nih.gov/pubmed/37761570 http://dx.doi.org/10.3390/e25091271 |
_version_ | 1785111464225275904 |
---|---|
author | Darwish, Saad M. Mhaimeed, Ibrahim Abdullah Elzoghabi, Adel A. |
author_facet | Darwish, Saad M. Mhaimeed, Ibrahim Abdullah Elzoghabi, Adel A. |
author_sort | Darwish, Saad M. |
collection | PubMed |
description | The majority of the recent research on text similarity has been focused on machine learning strategies to combat the problem in the educational environment. When the originality of an idea is copied, it increases the difficulty of using a plagiarism detection system in practice, and the system fails. In cases like active-to-passive conversion, phrase structure changes, synonym substitution, and sentence reordering, the present approaches may not be adequate for plagiarism detection. In this article, semantic extraction and the quantum genetic algorithm (QGA) are integrated in a unified framework to identify idea plagiarism with the aim of enhancing the performance of existing methods in terms of detection accuracy and computational time. Semantic similarity measures, which use the WordNet database to extract semantic information, are used to capture a document’s idea. In addition, the QGA is adapted to identify the interconnected, cohesive sentences that effectively convey the source document’s main idea. QGAs are formulated using the quantum computing paradigm based on qubits and the superposition of states. By using the qubit chromosome as a representation rather than the more traditional binary, numeric, or symbolic representations, the QGA is able to express a linear superposition of solutions with the aim of increasing gene diversity. Due to its fast convergence and strong global search capacity, the QGA is well suited for a parallel structure. The proposed model has been assessed using a PAN 13-14 dataset, and the result indicates the model’s ability to achieve significant detection improvement over some of the compared models. The recommended PD model achieves an approximately 20%, 15%, and 10% increase for TPR, PPV, and F-Score compared to GA and hierarchical GA (HGA)-based PD methods, respectively. Furthermore, the accuracy rate rises by approximately 10–15% for each increase in the number of samples in the dataset. |
format | Online Article Text |
id | pubmed-10530057 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-105300572023-09-28 A Quantum Genetic Algorithm for Building a Semantic Textual Similarity Estimation Framework for Plagiarism Detection Applications Darwish, Saad M. Mhaimeed, Ibrahim Abdullah Elzoghabi, Adel A. Entropy (Basel) Article The majority of the recent research on text similarity has been focused on machine learning strategies to combat the problem in the educational environment. When the originality of an idea is copied, it increases the difficulty of using a plagiarism detection system in practice, and the system fails. In cases like active-to-passive conversion, phrase structure changes, synonym substitution, and sentence reordering, the present approaches may not be adequate for plagiarism detection. In this article, semantic extraction and the quantum genetic algorithm (QGA) are integrated in a unified framework to identify idea plagiarism with the aim of enhancing the performance of existing methods in terms of detection accuracy and computational time. Semantic similarity measures, which use the WordNet database to extract semantic information, are used to capture a document’s idea. In addition, the QGA is adapted to identify the interconnected, cohesive sentences that effectively convey the source document’s main idea. QGAs are formulated using the quantum computing paradigm based on qubits and the superposition of states. By using the qubit chromosome as a representation rather than the more traditional binary, numeric, or symbolic representations, the QGA is able to express a linear superposition of solutions with the aim of increasing gene diversity. Due to its fast convergence and strong global search capacity, the QGA is well suited for a parallel structure. The proposed model has been assessed using a PAN 13-14 dataset, and the result indicates the model’s ability to achieve significant detection improvement over some of the compared models. The recommended PD model achieves an approximately 20%, 15%, and 10% increase for TPR, PPV, and F-Score compared to GA and hierarchical GA (HGA)-based PD methods, respectively. Furthermore, the accuracy rate rises by approximately 10–15% for each increase in the number of samples in the dataset. MDPI 2023-08-29 /pmc/articles/PMC10530057/ /pubmed/37761570 http://dx.doi.org/10.3390/e25091271 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Darwish, Saad M. Mhaimeed, Ibrahim Abdullah Elzoghabi, Adel A. A Quantum Genetic Algorithm for Building a Semantic Textual Similarity Estimation Framework for Plagiarism Detection Applications |
title | A Quantum Genetic Algorithm for Building a Semantic Textual Similarity Estimation Framework for Plagiarism Detection Applications |
title_full | A Quantum Genetic Algorithm for Building a Semantic Textual Similarity Estimation Framework for Plagiarism Detection Applications |
title_fullStr | A Quantum Genetic Algorithm for Building a Semantic Textual Similarity Estimation Framework for Plagiarism Detection Applications |
title_full_unstemmed | A Quantum Genetic Algorithm for Building a Semantic Textual Similarity Estimation Framework for Plagiarism Detection Applications |
title_short | A Quantum Genetic Algorithm for Building a Semantic Textual Similarity Estimation Framework for Plagiarism Detection Applications |
title_sort | quantum genetic algorithm for building a semantic textual similarity estimation framework for plagiarism detection applications |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10530057/ https://www.ncbi.nlm.nih.gov/pubmed/37761570 http://dx.doi.org/10.3390/e25091271 |
work_keys_str_mv | AT darwishsaadm aquantumgeneticalgorithmforbuildingasemantictextualsimilarityestimationframeworkforplagiarismdetectionapplications AT mhaimeedibrahimabdullah aquantumgeneticalgorithmforbuildingasemantictextualsimilarityestimationframeworkforplagiarismdetectionapplications AT elzoghabiadela aquantumgeneticalgorithmforbuildingasemantictextualsimilarityestimationframeworkforplagiarismdetectionapplications AT darwishsaadm quantumgeneticalgorithmforbuildingasemantictextualsimilarityestimationframeworkforplagiarismdetectionapplications AT mhaimeedibrahimabdullah quantumgeneticalgorithmforbuildingasemantictextualsimilarityestimationframeworkforplagiarismdetectionapplications AT elzoghabiadela quantumgeneticalgorithmforbuildingasemantictextualsimilarityestimationframeworkforplagiarismdetectionapplications |