Cargando…

FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data

Keyword extraction is essential in determining influenced keywords from huge documents as the research repositories are becoming massive in volume day by day. The research community is drowning in data and starving for information. The keywords are the words that describe the theme of the whole docu...

Descripción completa

Detalles Bibliográficos
Autores principales: Tahir, Noman, Asif, Muhammad, Ahmad, Shahbaz, Malik, Muhammad Sheraz Arshad, Aljuaid, Hanan, Butt, Muhammad Arif, Rehman, Mobashar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959634/
https://www.ncbi.nlm.nih.gov/pubmed/33817035
http://dx.doi.org/10.7717/peerj-cs.389
_version_ 1783664992397033472
author Tahir, Noman
Asif, Muhammad
Ahmad, Shahbaz
Malik, Muhammad Sheraz Arshad
Aljuaid, Hanan
Butt, Muhammad Arif
Rehman, Mobashar
author_facet Tahir, Noman
Asif, Muhammad
Ahmad, Shahbaz
Malik, Muhammad Sheraz Arshad
Aljuaid, Hanan
Butt, Muhammad Arif
Rehman, Mobashar
author_sort Tahir, Noman
collection PubMed
description Keyword extraction is essential in determining influenced keywords from huge documents as the research repositories are becoming massive in volume day by day. The research community is drowning in data and starving for information. The keywords are the words that describe the theme of the whole document in a precise way by consisting of just a few words. Furthermore, many state-of-the-art approaches are available for keyword extraction from a huge collection of documents and are classified into three types, the statistical approaches, machine learning, and graph-based methods. The machine learning approaches require a large training dataset that needs to be developed manually by domain experts, which sometimes is difficult to produce while determining influenced keywords. However, this research focused on enhancing state-of-the-art graph-based methods to extract keywords when the training dataset is unavailable. This research first converted the handcrafted dataset, collected from impact factor journals into n-grams combinations, ranging from unigram to pentagram and also enhanced traditional graph-based approaches. The experiment was conducted on a handcrafted dataset, and all methods were applied on it. Domain experts performed the user study to evaluate the results. The results were observed from every method and were evaluated with the user study using precision, recall and f-measure as evaluation matrices. The results showed that the proposed method (FNG-IE) performed well and scored near the machine learning approaches score.
format Online
Article
Text
id pubmed-7959634
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79596342021-04-02 FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data Tahir, Noman Asif, Muhammad Ahmad, Shahbaz Malik, Muhammad Sheraz Arshad Aljuaid, Hanan Butt, Muhammad Arif Rehman, Mobashar PeerJ Comput Sci Emerging Technologies Keyword extraction is essential in determining influenced keywords from huge documents as the research repositories are becoming massive in volume day by day. The research community is drowning in data and starving for information. The keywords are the words that describe the theme of the whole document in a precise way by consisting of just a few words. Furthermore, many state-of-the-art approaches are available for keyword extraction from a huge collection of documents and are classified into three types, the statistical approaches, machine learning, and graph-based methods. The machine learning approaches require a large training dataset that needs to be developed manually by domain experts, which sometimes is difficult to produce while determining influenced keywords. However, this research focused on enhancing state-of-the-art graph-based methods to extract keywords when the training dataset is unavailable. This research first converted the handcrafted dataset, collected from impact factor journals into n-grams combinations, ranging from unigram to pentagram and also enhanced traditional graph-based approaches. The experiment was conducted on a handcrafted dataset, and all methods were applied on it. Domain experts performed the user study to evaluate the results. The results were observed from every method and were evaluated with the user study using precision, recall and f-measure as evaluation matrices. The results showed that the proposed method (FNG-IE) performed well and scored near the machine learning approaches score. PeerJ Inc. 2021-03-11 /pmc/articles/PMC7959634/ /pubmed/33817035 http://dx.doi.org/10.7717/peerj-cs.389 Text en © 2021 Tahir et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Emerging Technologies
Tahir, Noman
Asif, Muhammad
Ahmad, Shahbaz
Malik, Muhammad Sheraz Arshad
Aljuaid, Hanan
Butt, Muhammad Arif
Rehman, Mobashar
FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data
title FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data
title_full FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data
title_fullStr FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data
title_full_unstemmed FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data
title_short FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data
title_sort fng-ie: an improved graph-based method for keyword extraction from scholarly big-data
topic Emerging Technologies
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959634/
https://www.ncbi.nlm.nih.gov/pubmed/33817035
http://dx.doi.org/10.7717/peerj-cs.389
work_keys_str_mv AT tahirnoman fngieanimprovedgraphbasedmethodforkeywordextractionfromscholarlybigdata
AT asifmuhammad fngieanimprovedgraphbasedmethodforkeywordextractionfromscholarlybigdata
AT ahmadshahbaz fngieanimprovedgraphbasedmethodforkeywordextractionfromscholarlybigdata
AT malikmuhammadsherazarshad fngieanimprovedgraphbasedmethodforkeywordextractionfromscholarlybigdata
AT aljuaidhanan fngieanimprovedgraphbasedmethodforkeywordextractionfromscholarlybigdata
AT buttmuhammadarif fngieanimprovedgraphbasedmethodforkeywordextractionfromscholarlybigdata
AT rehmanmobashar fngieanimprovedgraphbasedmethodforkeywordextractionfromscholarlybigdata