Cargando…
A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents
Natural Language Processing (NLP) library is widely used while analyzing software documents. The numerous toolkits result in a problem on NLP library selection. The selection of NLP library in current work commonly misses some objective reasons, which may pose threats to validity. And it is also not...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7266445/ http://dx.doi.org/10.1007/978-3-030-49435-3_32 |
_version_ | 1783541310847713280 |
---|---|
author | Cheng, Xinyun Kong, Xianglong Liao, Li Li, Bixin |
author_facet | Cheng, Xinyun Kong, Xianglong Liao, Li Li, Bixin |
author_sort | Cheng, Xinyun |
collection | PubMed |
description | Natural Language Processing (NLP) library is widely used while analyzing software documents. The numerous toolkits result in a problem on NLP library selection. The selection of NLP library in current work commonly misses some objective reasons, which may pose threats to validity. And it is also not clear that whether the existing guideline on selection still works for the latest versions. In this work, we propose a solution for NLP library selection when the effectiveness is unknown. We use the NLP libraries together in a combined method. Our combined method can utilize the strengths of different NLP libraries to obtain accurate results. The combination is conducted through two steps, i.e., document-level selection of NLP library and sentence-level overwriting. In document-level selection of primary library, the results are obtained from the library that has the highest overall accuracy. Through sentence-level overwriting, the possible fine-gained improvements from other libraries are extracted to overwrite the outputs of primary library. We evaluate the combined method with 4 widely used NLP libraries and 200 documents from 3 different sources. The results show that the combined method can generally outperform all the studied NLP libraries in terms of accuracy. The finding means that our combined method can be used instead of individual NLP library for more effective results. |
format | Online Article Text |
id | pubmed-7266445 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-72664452020-06-03 A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents Cheng, Xinyun Kong, Xianglong Liao, Li Li, Bixin Advanced Information Systems Engineering Article Natural Language Processing (NLP) library is widely used while analyzing software documents. The numerous toolkits result in a problem on NLP library selection. The selection of NLP library in current work commonly misses some objective reasons, which may pose threats to validity. And it is also not clear that whether the existing guideline on selection still works for the latest versions. In this work, we propose a solution for NLP library selection when the effectiveness is unknown. We use the NLP libraries together in a combined method. Our combined method can utilize the strengths of different NLP libraries to obtain accurate results. The combination is conducted through two steps, i.e., document-level selection of NLP library and sentence-level overwriting. In document-level selection of primary library, the results are obtained from the library that has the highest overall accuracy. Through sentence-level overwriting, the possible fine-gained improvements from other libraries are extracted to overwrite the outputs of primary library. We evaluate the combined method with 4 widely used NLP libraries and 200 documents from 3 different sources. The results show that the combined method can generally outperform all the studied NLP libraries in terms of accuracy. The finding means that our combined method can be used instead of individual NLP library for more effective results. 2020-05-09 /pmc/articles/PMC7266445/ http://dx.doi.org/10.1007/978-3-030-49435-3_32 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Cheng, Xinyun Kong, Xianglong Liao, Li Li, Bixin A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents |
title | A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents |
title_full | A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents |
title_fullStr | A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents |
title_full_unstemmed | A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents |
title_short | A Combined Method for Usage of NLP Libraries Towards Analyzing Software Documents |
title_sort | combined method for usage of nlp libraries towards analyzing software documents |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7266445/ http://dx.doi.org/10.1007/978-3-030-49435-3_32 |
work_keys_str_mv | AT chengxinyun acombinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments AT kongxianglong acombinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments AT liaoli acombinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments AT libixin acombinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments AT chengxinyun combinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments AT kongxianglong combinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments AT liaoli combinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments AT libixin combinedmethodforusageofnlplibrariestowardsanalyzingsoftwaredocuments |