Cargando…
The textual characteristics of traditional and Open Access scientific journals are similar
BACKGROUND: Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in gen...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2714574/ https://www.ncbi.nlm.nih.gov/pubmed/19527520 http://dx.doi.org/10.1186/1471-2105-10-183 |
_version_ | 1782169691622473728 |
---|---|
author | Verspoor, Karin Cohen, K Bretonnel Hunter, Lawrence |
author_facet | Verspoor, Karin Cohen, K Bretonnel Hunter, Lawrence |
author_sort | Verspoor, Karin |
collection | PubMed |
description | BACKGROUND: Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in general and that methods developed for analysis of Open Access full text publications will generalize to the biomedical literature as a whole. If this assumption is wrong, the cost to the community will be large, including not just wasted resources, but also flawed science. This paper examines that assumption. RESULTS: We collected two sets of documents, one consisting only of Open Access publications and the other consisting only of traditional journal publications. We examined them for differences in surface linguistic structures that have obvious consequences for the ease or difficulty of natural language processing and for differences in semantic content as reflected in lexical items. Regarding surface linguistic structures, we examined the incidence of conjunctions, negation, passives, and pronominal anaphora, and found that the two collections did not differ. We also examined the distribution of sentence lengths and found that both collections were characterized by the same mode. Regarding lexical items, we found that the Kullback-Leibler divergence between the two collections was low, and was lower than the divergence between either collection and a reference corpus. Where small differences did exist, log likelihood analysis showed that they were primarily in the area of formatting and in specific named entities. CONCLUSION: We did not find structural or semantic differences between the Open Access and traditional journal collections. |
format | Text |
id | pubmed-2714574 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-27145742009-07-24 The textual characteristics of traditional and Open Access scientific journals are similar Verspoor, Karin Cohen, K Bretonnel Hunter, Lawrence BMC Bioinformatics Research Article BACKGROUND: Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in general and that methods developed for analysis of Open Access full text publications will generalize to the biomedical literature as a whole. If this assumption is wrong, the cost to the community will be large, including not just wasted resources, but also flawed science. This paper examines that assumption. RESULTS: We collected two sets of documents, one consisting only of Open Access publications and the other consisting only of traditional journal publications. We examined them for differences in surface linguistic structures that have obvious consequences for the ease or difficulty of natural language processing and for differences in semantic content as reflected in lexical items. Regarding surface linguistic structures, we examined the incidence of conjunctions, negation, passives, and pronominal anaphora, and found that the two collections did not differ. We also examined the distribution of sentence lengths and found that both collections were characterized by the same mode. Regarding lexical items, we found that the Kullback-Leibler divergence between the two collections was low, and was lower than the divergence between either collection and a reference corpus. Where small differences did exist, log likelihood analysis showed that they were primarily in the area of formatting and in specific named entities. CONCLUSION: We did not find structural or semantic differences between the Open Access and traditional journal collections. BioMed Central 2009-06-15 /pmc/articles/PMC2714574/ /pubmed/19527520 http://dx.doi.org/10.1186/1471-2105-10-183 Text en Copyright © 2009 Verspoor et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Verspoor, Karin Cohen, K Bretonnel Hunter, Lawrence The textual characteristics of traditional and Open Access scientific journals are similar |
title | The textual characteristics of traditional and Open Access scientific journals are similar |
title_full | The textual characteristics of traditional and Open Access scientific journals are similar |
title_fullStr | The textual characteristics of traditional and Open Access scientific journals are similar |
title_full_unstemmed | The textual characteristics of traditional and Open Access scientific journals are similar |
title_short | The textual characteristics of traditional and Open Access scientific journals are similar |
title_sort | textual characteristics of traditional and open access scientific journals are similar |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2714574/ https://www.ncbi.nlm.nih.gov/pubmed/19527520 http://dx.doi.org/10.1186/1471-2105-10-183 |
work_keys_str_mv | AT verspoorkarin thetextualcharacteristicsoftraditionalandopenaccessscientificjournalsaresimilar AT cohenkbretonnel thetextualcharacteristicsoftraditionalandopenaccessscientificjournalsaresimilar AT hunterlawrence thetextualcharacteristicsoftraditionalandopenaccessscientificjournalsaresimilar AT verspoorkarin textualcharacteristicsoftraditionalandopenaccessscientificjournalsaresimilar AT cohenkbretonnel textualcharacteristicsoftraditionalandopenaccessscientificjournalsaresimilar AT hunterlawrence textualcharacteristicsoftraditionalandopenaccessscientificjournalsaresimilar |