Cargando…

Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora

Despite the accessibility of numerous online corpora, students and researchers engaged in the fields of Natural Language Processing (NLP), corpus linguistics, and language learning and teaching may encounter situations in which they need to develop their own corpora. Several commercial and free stan...

Descripción completa

Detalles Bibliográficos
Autores principales: Al-Thubaity, Abdulmohsen, Al-Khalifa, Hend, Alqifari, Reem, Almazrua, Manal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4294294/
https://www.ncbi.nlm.nih.gov/pubmed/25610910
http://dx.doi.org/10.1155/2014/602745
_version_ 1782352701013622784
author Al-Thubaity, Abdulmohsen
Al-Khalifa, Hend
Alqifari, Reem
Almazrua, Manal
author_facet Al-Thubaity, Abdulmohsen
Al-Khalifa, Hend
Alqifari, Reem
Almazrua, Manal
author_sort Al-Thubaity, Abdulmohsen
collection PubMed
description Despite the accessibility of numerous online corpora, students and researchers engaged in the fields of Natural Language Processing (NLP), corpus linguistics, and language learning and teaching may encounter situations in which they need to develop their own corpora. Several commercial and free standalone corpora processing systems are available to process such corpora. In this study, we first propose a framework for the evaluation of standalone corpora processing systems and then use it to evaluate seven freely available systems. The proposed framework considers the usability, functionality, and performance of the evaluated systems while taking into consideration their suitability for Arabic corpora. While the results show that most of the evaluated systems exhibited comparable usability scores, the scores for functionality and performance were substantially different with respect to support for the Arabic language and N-grams profile generation. The results of our evaluation will help potential users of the evaluated systems to choose the system that best meets their needs. More importantly, the results will help the developers of the evaluated systems to enhance their systems and developers of new corpora processing systems by providing them with a reference framework.
format Online
Article
Text
id pubmed-4294294
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-42942942015-01-21 Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora Al-Thubaity, Abdulmohsen Al-Khalifa, Hend Alqifari, Reem Almazrua, Manal ScientificWorldJournal Research Article Despite the accessibility of numerous online corpora, students and researchers engaged in the fields of Natural Language Processing (NLP), corpus linguistics, and language learning and teaching may encounter situations in which they need to develop their own corpora. Several commercial and free standalone corpora processing systems are available to process such corpora. In this study, we first propose a framework for the evaluation of standalone corpora processing systems and then use it to evaluate seven freely available systems. The proposed framework considers the usability, functionality, and performance of the evaluated systems while taking into consideration their suitability for Arabic corpora. While the results show that most of the evaluated systems exhibited comparable usability scores, the scores for functionality and performance were substantially different with respect to support for the Arabic language and N-grams profile generation. The results of our evaluation will help potential users of the evaluated systems to choose the system that best meets their needs. More importantly, the results will help the developers of the evaluated systems to enhance their systems and developers of new corpora processing systems by providing them with a reference framework. Hindawi Publishing Corporation 2014 2014-12-31 /pmc/articles/PMC4294294/ /pubmed/25610910 http://dx.doi.org/10.1155/2014/602745 Text en Copyright © 2014 Abdulmohsen Al-Thubaity et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Al-Thubaity, Abdulmohsen
Al-Khalifa, Hend
Alqifari, Reem
Almazrua, Manal
Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora
title Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora
title_full Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora
title_fullStr Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora
title_full_unstemmed Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora
title_short Proposed Framework for the Evaluation of Standalone Corpora Processing Systems: An Application to Arabic Corpora
title_sort proposed framework for the evaluation of standalone corpora processing systems: an application to arabic corpora
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4294294/
https://www.ncbi.nlm.nih.gov/pubmed/25610910
http://dx.doi.org/10.1155/2014/602745
work_keys_str_mv AT althubaityabdulmohsen proposedframeworkfortheevaluationofstandalonecorporaprocessingsystemsanapplicationtoarabiccorpora
AT alkhalifahend proposedframeworkfortheevaluationofstandalonecorporaprocessingsystemsanapplicationtoarabiccorpora
AT alqifarireem proposedframeworkfortheevaluationofstandalonecorporaprocessingsystemsanapplicationtoarabiccorpora
AT almazruamanal proposedframeworkfortheevaluationofstandalonecorporaprocessingsystemsanapplicationtoarabiccorpora