Cargando…

GenBank and PubMed: How connected are they?

BACKGROUND: GenBank(R) is a public repository of all publicly available molecular sequence data from a range of sources. In addition to relevant metadata (e.g., sequence description, source organism and taxonomy), publication information is recorded in the GenBank data file. The identification of li...

Descripción completa

Detalles Bibliográficos
Autores principales: Miller, Holly, Norton, Catherine N, Sarkar, Indra Neil
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2704225/
https://www.ncbi.nlm.nih.gov/pubmed/19508734
http://dx.doi.org/10.1186/1756-0500-2-101
_version_ 1782168919713251328
author Miller, Holly
Norton, Catherine N
Sarkar, Indra Neil
author_facet Miller, Holly
Norton, Catherine N
Sarkar, Indra Neil
author_sort Miller, Holly
collection PubMed
description BACKGROUND: GenBank(R) is a public repository of all publicly available molecular sequence data from a range of sources. In addition to relevant metadata (e.g., sequence description, source organism and taxonomy), publication information is recorded in the GenBank data file. The identification of literature associated with a given molecular sequence may be an essential first step in developing research hypotheses. Although many of the publications associated with GenBank records may not be linked into or part of complementary literature databases (e.g., PubMed), GenBank records associated with literature indexed in Medline are identifiable as they contain PubMed identifiers (PMIDs). RESULTS: Here we show that an analysis of 87,116,501 GenBank sequence files reveals that 42% are associated with a publication or patent. Of these, 71% are associated with PMIDs, and can therefore be linked to a citation record in the PubMed database. The remaining (29%) of publication-associated GenBank entries either do not have PMIDs or cite a publication that is not currently indexed by PubMed. We also identify the journal titles that are linked through citations in the GenBank files to the largest number of sequences. CONCLUSION: Our analysis suggests that GenBank contains molecular sequences from a range of disciplines beyond biomedicine, the initial scope of PubMed. The findings thus suggest opportunities to develop mechanisms for integrating biological knowledge beyond the biomedical field.
format Text
id pubmed-2704225
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27042252009-07-01 GenBank and PubMed: How connected are they? Miller, Holly Norton, Catherine N Sarkar, Indra Neil BMC Res Notes Correspondence BACKGROUND: GenBank(R) is a public repository of all publicly available molecular sequence data from a range of sources. In addition to relevant metadata (e.g., sequence description, source organism and taxonomy), publication information is recorded in the GenBank data file. The identification of literature associated with a given molecular sequence may be an essential first step in developing research hypotheses. Although many of the publications associated with GenBank records may not be linked into or part of complementary literature databases (e.g., PubMed), GenBank records associated with literature indexed in Medline are identifiable as they contain PubMed identifiers (PMIDs). RESULTS: Here we show that an analysis of 87,116,501 GenBank sequence files reveals that 42% are associated with a publication or patent. Of these, 71% are associated with PMIDs, and can therefore be linked to a citation record in the PubMed database. The remaining (29%) of publication-associated GenBank entries either do not have PMIDs or cite a publication that is not currently indexed by PubMed. We also identify the journal titles that are linked through citations in the GenBank files to the largest number of sequences. CONCLUSION: Our analysis suggests that GenBank contains molecular sequences from a range of disciplines beyond biomedicine, the initial scope of PubMed. The findings thus suggest opportunities to develop mechanisms for integrating biological knowledge beyond the biomedical field. BioMed Central 2009-06-09 /pmc/articles/PMC2704225/ /pubmed/19508734 http://dx.doi.org/10.1186/1756-0500-2-101 Text en Copyright © 2009 Sarkar et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Correspondence
Miller, Holly
Norton, Catherine N
Sarkar, Indra Neil
GenBank and PubMed: How connected are they?
title GenBank and PubMed: How connected are they?
title_full GenBank and PubMed: How connected are they?
title_fullStr GenBank and PubMed: How connected are they?
title_full_unstemmed GenBank and PubMed: How connected are they?
title_short GenBank and PubMed: How connected are they?
title_sort genbank and pubmed: how connected are they?
topic Correspondence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2704225/
https://www.ncbi.nlm.nih.gov/pubmed/19508734
http://dx.doi.org/10.1186/1756-0500-2-101
work_keys_str_mv AT millerholly genbankandpubmedhowconnectedarethey
AT nortoncatherinen genbankandpubmedhowconnectedarethey
AT sarkarindraneil genbankandpubmedhowconnectedarethey