Cargando…

Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach

Publication metadata help deliver rich analyses of scholarly communication. However, research concepts and ideas are more effectively expressed through unstructured fields such as full texts. Thus, the goals of this paper are to employ a full-text enabled method to extract terms relevant to discipli...

Descripción completa

Detalles Bibliográficos
Autores principales: Yan, Erjia, Williams, Jake, Chen, Zheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5706669/
https://www.ncbi.nlm.nih.gov/pubmed/29186141
http://dx.doi.org/10.1371/journal.pone.0187762
_version_ 1783282263302078464
author Yan, Erjia
Williams, Jake
Chen, Zheng
author_facet Yan, Erjia
Williams, Jake
Chen, Zheng
author_sort Yan, Erjia
collection PubMed
description Publication metadata help deliver rich analyses of scholarly communication. However, research concepts and ideas are more effectively expressed through unstructured fields such as full texts. Thus, the goals of this paper are to employ a full-text enabled method to extract terms relevant to disciplinary vocabularies, and through them, to understand the relationships between disciplines. This paper uses an efficient, domain-independent term extraction method to extract disciplinary vocabularies from a large multidisciplinary corpus of PLoS ONE publications. It finds a power-law pattern in the frequency distributions of terms present in each discipline, indicating a semantic richness potentially sufficient for further study and advanced analysis. The salient relationships amongst these vocabularies become apparent in application of a principal component analysis. For example, Mathematics and Computer and Information Sciences were found to have similar vocabulary use patterns along with Engineering and Physics; while Chemistry and the Social Sciences were found to exhibit contrasting vocabulary use patterns along with the Earth Sciences and Chemistry. These results have implications to studies of scholarly communication as scholars attempt to identify the epistemological cultures of disciplines, and as a full text-based methodology could lead to machine learning applications in the automated classification of scholarly work according to disciplinary vocabularies.
format Online
Article
Text
id pubmed-5706669
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-57066692017-12-08 Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach Yan, Erjia Williams, Jake Chen, Zheng PLoS One Research Article Publication metadata help deliver rich analyses of scholarly communication. However, research concepts and ideas are more effectively expressed through unstructured fields such as full texts. Thus, the goals of this paper are to employ a full-text enabled method to extract terms relevant to disciplinary vocabularies, and through them, to understand the relationships between disciplines. This paper uses an efficient, domain-independent term extraction method to extract disciplinary vocabularies from a large multidisciplinary corpus of PLoS ONE publications. It finds a power-law pattern in the frequency distributions of terms present in each discipline, indicating a semantic richness potentially sufficient for further study and advanced analysis. The salient relationships amongst these vocabularies become apparent in application of a principal component analysis. For example, Mathematics and Computer and Information Sciences were found to have similar vocabulary use patterns along with Engineering and Physics; while Chemistry and the Social Sciences were found to exhibit contrasting vocabulary use patterns along with the Earth Sciences and Chemistry. These results have implications to studies of scholarly communication as scholars attempt to identify the epistemological cultures of disciplines, and as a full text-based methodology could lead to machine learning applications in the automated classification of scholarly work according to disciplinary vocabularies. Public Library of Science 2017-11-29 /pmc/articles/PMC5706669/ /pubmed/29186141 http://dx.doi.org/10.1371/journal.pone.0187762 Text en © 2017 Yan et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Yan, Erjia
Williams, Jake
Chen, Zheng
Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach
title Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach
title_full Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach
title_fullStr Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach
title_full_unstemmed Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach
title_short Understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach
title_sort understanding disciplinary vocabularies using a full-text enabled domain-independent term extraction approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5706669/
https://www.ncbi.nlm.nih.gov/pubmed/29186141
http://dx.doi.org/10.1371/journal.pone.0187762
work_keys_str_mv AT yanerjia understandingdisciplinaryvocabulariesusingafulltextenableddomainindependenttermextractionapproach
AT williamsjake understandingdisciplinaryvocabulariesusingafulltextenableddomainindependenttermextractionapproach
AT chenzheng understandingdisciplinaryvocabulariesusingafulltextenableddomainindependenttermextractionapproach