Cargando…

The textual characteristics of traditional and Open Access scientific journals are similar

BACKGROUND: Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in gen...

Descripción completa

Detalles Bibliográficos
Autores principales: Verspoor, Karin, Cohen, K Bretonnel, Hunter, Lawrence
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2714574/
https://www.ncbi.nlm.nih.gov/pubmed/19527520
http://dx.doi.org/10.1186/1471-2105-10-183
_version_ 1782169691622473728
author Verspoor, Karin
Cohen, K Bretonnel
Hunter, Lawrence
author_facet Verspoor, Karin
Cohen, K Bretonnel
Hunter, Lawrence
author_sort Verspoor, Karin
collection PubMed
description BACKGROUND: Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in general and that methods developed for analysis of Open Access full text publications will generalize to the biomedical literature as a whole. If this assumption is wrong, the cost to the community will be large, including not just wasted resources, but also flawed science. This paper examines that assumption. RESULTS: We collected two sets of documents, one consisting only of Open Access publications and the other consisting only of traditional journal publications. We examined them for differences in surface linguistic structures that have obvious consequences for the ease or difficulty of natural language processing and for differences in semantic content as reflected in lexical items. Regarding surface linguistic structures, we examined the incidence of conjunctions, negation, passives, and pronominal anaphora, and found that the two collections did not differ. We also examined the distribution of sentence lengths and found that both collections were characterized by the same mode. Regarding lexical items, we found that the Kullback-Leibler divergence between the two collections was low, and was lower than the divergence between either collection and a reference corpus. Where small differences did exist, log likelihood analysis showed that they were primarily in the area of formatting and in specific named entities. CONCLUSION: We did not find structural or semantic differences between the Open Access and traditional journal collections.
format Text
id pubmed-2714574
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27145742009-07-24 The textual characteristics of traditional and Open Access scientific journals are similar Verspoor, Karin Cohen, K Bretonnel Hunter, Lawrence BMC Bioinformatics Research Article BACKGROUND: Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in general and that methods developed for analysis of Open Access full text publications will generalize to the biomedical literature as a whole. If this assumption is wrong, the cost to the community will be large, including not just wasted resources, but also flawed science. This paper examines that assumption. RESULTS: We collected two sets of documents, one consisting only of Open Access publications and the other consisting only of traditional journal publications. We examined them for differences in surface linguistic structures that have obvious consequences for the ease or difficulty of natural language processing and for differences in semantic content as reflected in lexical items. Regarding surface linguistic structures, we examined the incidence of conjunctions, negation, passives, and pronominal anaphora, and found that the two collections did not differ. We also examined the distribution of sentence lengths and found that both collections were characterized by the same mode. Regarding lexical items, we found that the Kullback-Leibler divergence between the two collections was low, and was lower than the divergence between either collection and a reference corpus. Where small differences did exist, log likelihood analysis showed that they were primarily in the area of formatting and in specific named entities. CONCLUSION: We did not find structural or semantic differences between the Open Access and traditional journal collections. BioMed Central 2009-06-15 /pmc/articles/PMC2714574/ /pubmed/19527520 http://dx.doi.org/10.1186/1471-2105-10-183 Text en Copyright © 2009 Verspoor et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Verspoor, Karin
Cohen, K Bretonnel
Hunter, Lawrence
The textual characteristics of traditional and Open Access scientific journals are similar
title The textual characteristics of traditional and Open Access scientific journals are similar
title_full The textual characteristics of traditional and Open Access scientific journals are similar
title_fullStr The textual characteristics of traditional and Open Access scientific journals are similar
title_full_unstemmed The textual characteristics of traditional and Open Access scientific journals are similar
title_short The textual characteristics of traditional and Open Access scientific journals are similar
title_sort textual characteristics of traditional and open access scientific journals are similar
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2714574/
https://www.ncbi.nlm.nih.gov/pubmed/19527520
http://dx.doi.org/10.1186/1471-2105-10-183
work_keys_str_mv AT verspoorkarin thetextualcharacteristicsoftraditionalandopenaccessscientificjournalsaresimilar
AT cohenkbretonnel thetextualcharacteristicsoftraditionalandopenaccessscientificjournalsaresimilar
AT hunterlawrence thetextualcharacteristicsoftraditionalandopenaccessscientificjournalsaresimilar
AT verspoorkarin textualcharacteristicsoftraditionalandopenaccessscientificjournalsaresimilar
AT cohenkbretonnel textualcharacteristicsoftraditionalandopenaccessscientificjournalsaresimilar
AT hunterlawrence textualcharacteristicsoftraditionalandopenaccessscientificjournalsaresimilar