Cargando…

A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents

Natural Language Processing (NLP) library is widely used while analyzing software documents. The numerous toolkits result in a problem on NLP library selection. The selection of NLP library in current work commonly misses some objective reasons, which may pose threats to validity. And it is also not...

Descripción completa

Detalles Bibliográficos
Autores principales: Cheng, Xinyun, Kong, Xianglong, Liao, Li, Li, Bixin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7266445/
http://dx.doi.org/10.1007/978-3-030-49435-3_32
_version_ 1783541310847713280
author Cheng, Xinyun
Kong, Xianglong
Liao, Li
Li, Bixin
author_facet Cheng, Xinyun
Kong, Xianglong
Liao, Li
Li, Bixin
author_sort Cheng, Xinyun
collection PubMed
description Natural Language Processing (NLP) library is widely used while analyzing software documents. The numerous toolkits result in a problem on NLP library selection. The selection of NLP library in current work commonly misses some objective reasons, which may pose threats to validity. And it is also not clear that whether the existing guideline on selection still works for the latest versions. In this work, we propose a solution for NLP library selection when the effectiveness is unknown. We use the NLP libraries together in a combined method. Our combined method can utilize the strengths of different NLP libraries to obtain accurate results. The combination is conducted through two steps, i.e., document-level selection of NLP library and sentence-level overwriting. In document-level selection of primary library, the results are obtained from the library that has the highest overall accuracy. Through sentence-level overwriting, the possible fine-gained improvements from other libraries are extracted to overwrite the outputs of primary library. We evaluate the combined method with 4 widely used NLP libraries and 200 documents from 3 different sources. The results show that the combined method can generally outperform all the studied NLP libraries in terms of accuracy. The finding means that our combined method can be used instead of individual NLP library for more effective results.
format Online
Article
Text
id pubmed-7266445
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72664452020-06-03 A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents Cheng, Xinyun Kong, Xianglong Liao, Li Li, Bixin Advanced Information Systems Engineering Article Natural Language Processing (NLP) library is widely used while analyzing software documents. The numerous toolkits result in a problem on NLP library selection. The selection of NLP library in current work commonly misses some objective reasons, which may pose threats to validity. And it is also not clear that whether the existing guideline on selection still works for the latest versions. In this work, we propose a solution for NLP library selection when the effectiveness is unknown. We use the NLP libraries together in a combined method. Our combined method can utilize the strengths of different NLP libraries to obtain accurate results. The combination is conducted through two steps, i.e., document-level selection of NLP library and sentence-level overwriting. In document-level selection of primary library, the results are obtained from the library that has the highest overall accuracy. Through sentence-level overwriting, the possible fine-gained improvements from other libraries are extracted to overwrite the outputs of primary library. We evaluate the combined method with 4 widely used NLP libraries and 200 documents from 3 different sources. The results show that the combined method can generally outperform all the studied NLP libraries in terms of accuracy. The finding means that our combined method can be used instead of individual NLP library for more effective results. 2020-05-09 /pmc/articles/PMC7266445/ http://dx.doi.org/10.1007/978-3-030-49435-3_32 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Cheng, Xinyun
Kong, Xianglong
Liao, Li
Li, Bixin
A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents
title A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents
title_full A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents
title_fullStr A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents
title_full_unstemmed A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents
title_short A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents
title_sort combined method for usage of nlp libraries towards analyzing software documents
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7266445/
http://dx.doi.org/10.1007/978-3-030-49435-3_32
work_keys_str_mv AT chengxinyun acombinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments
AT kongxianglong acombinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments
AT liaoli acombinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments
AT libixin acombinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments
AT chengxinyun combinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments
AT kongxianglong combinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments
AT liaoli combinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments
AT libixin combinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments