Cargando…

pubmed2ensembl: A Resource for Mining the Biological Literature on Genes

BACKGROUND: The last two decades have witnessed a dramatic acceleration in the production of genomic sequence information and publication of biomedical articles. Despite the fact that genome sequence data and publications are two of the most heavily relied-upon sources of information for many biolog...

Descripción completa

Detalles Bibliográficos
Autores principales: Baran, Joachim, Gerner, Martin, Haeussler, Maximilian, Nenadic, Goran, Bergman, Casey M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3183000/
https://www.ncbi.nlm.nih.gov/pubmed/21980353
http://dx.doi.org/10.1371/journal.pone.0024716
_version_ 1782212964635377664
author Baran, Joachim
Gerner, Martin
Haeussler, Maximilian
Nenadic, Goran
Bergman, Casey M.
author_facet Baran, Joachim
Gerner, Martin
Haeussler, Maximilian
Nenadic, Goran
Bergman, Casey M.
author_sort Baran, Joachim
collection PubMed
description BACKGROUND: The last two decades have witnessed a dramatic acceleration in the production of genomic sequence information and publication of biomedical articles. Despite the fact that genome sequence data and publications are two of the most heavily relied-upon sources of information for many biologists, very little effort has been made to systematically integrate data from genomic sequences directly with the biological literature. For a limited number of model organisms dedicated teams manually curate publications about genes; however for species with no such dedicated staff many thousands of articles are never mapped to genes or genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: To overcome the lack of integration between genomic data and biological literature, we have developed pubmed2ensembl (http://www.pubmed2ensembl.org), an extension to the BioMart system that links over 2,000,000 articles in PubMed to nearly 150,000 genes in Ensembl from 50 species. We use several sources of curated (e.g., Entrez Gene) and automatically generated (e.g., gene names extracted through text-mining on MEDLINE records) sources of gene-publication links, allowing users to filter and combine different data sources to suit their individual needs for information extraction and biological discovery. In addition to extending the Ensembl BioMart database to include published information on genes, we also implemented a scripting language for automated BioMart construction and a novel BioMart interface that allows text-based queries to be performed against PubMed and PubMed Central documents in conjunction with constraints on genomic features. Finally, we illustrate the potential of pubmed2ensembl through typical use cases that involve integrated queries across the biomedical literature and genomic data. CONCLUSION/SIGNIFICANCE: By allowing biologists to find the relevant literature on specific genomic regions or sets of functionally related genes more easily, pubmed2ensembl offers a much-needed genome informatics inspired solution to accessing the ever-increasing biomedical literature.
format Online
Article
Text
id pubmed-3183000
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-31830002011-10-06 pubmed2ensembl: A Resource for Mining the Biological Literature on Genes Baran, Joachim Gerner, Martin Haeussler, Maximilian Nenadic, Goran Bergman, Casey M. PLoS One Research Article BACKGROUND: The last two decades have witnessed a dramatic acceleration in the production of genomic sequence information and publication of biomedical articles. Despite the fact that genome sequence data and publications are two of the most heavily relied-upon sources of information for many biologists, very little effort has been made to systematically integrate data from genomic sequences directly with the biological literature. For a limited number of model organisms dedicated teams manually curate publications about genes; however for species with no such dedicated staff many thousands of articles are never mapped to genes or genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: To overcome the lack of integration between genomic data and biological literature, we have developed pubmed2ensembl (http://www.pubmed2ensembl.org), an extension to the BioMart system that links over 2,000,000 articles in PubMed to nearly 150,000 genes in Ensembl from 50 species. We use several sources of curated (e.g., Entrez Gene) and automatically generated (e.g., gene names extracted through text-mining on MEDLINE records) sources of gene-publication links, allowing users to filter and combine different data sources to suit their individual needs for information extraction and biological discovery. In addition to extending the Ensembl BioMart database to include published information on genes, we also implemented a scripting language for automated BioMart construction and a novel BioMart interface that allows text-based queries to be performed against PubMed and PubMed Central documents in conjunction with constraints on genomic features. Finally, we illustrate the potential of pubmed2ensembl through typical use cases that involve integrated queries across the biomedical literature and genomic data. CONCLUSION/SIGNIFICANCE: By allowing biologists to find the relevant literature on specific genomic regions or sets of functionally related genes more easily, pubmed2ensembl offers a much-needed genome informatics inspired solution to accessing the ever-increasing biomedical literature. Public Library of Science 2011-09-29 /pmc/articles/PMC3183000/ /pubmed/21980353 http://dx.doi.org/10.1371/journal.pone.0024716 Text en Baran et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Baran, Joachim
Gerner, Martin
Haeussler, Maximilian
Nenadic, Goran
Bergman, Casey M.
pubmed2ensembl: A Resource for Mining the Biological Literature on Genes
title pubmed2ensembl: A Resource for Mining the Biological Literature on Genes
title_full pubmed2ensembl: A Resource for Mining the Biological Literature on Genes
title_fullStr pubmed2ensembl: A Resource for Mining the Biological Literature on Genes
title_full_unstemmed pubmed2ensembl: A Resource for Mining the Biological Literature on Genes
title_short pubmed2ensembl: A Resource for Mining the Biological Literature on Genes
title_sort pubmed2ensembl: a resource for mining the biological literature on genes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3183000/
https://www.ncbi.nlm.nih.gov/pubmed/21980353
http://dx.doi.org/10.1371/journal.pone.0024716
work_keys_str_mv AT baranjoachim pubmed2ensemblaresourceforminingthebiologicalliteratureongenes
AT gernermartin pubmed2ensemblaresourceforminingthebiologicalliteratureongenes
AT haeusslermaximilian pubmed2ensemblaresourceforminingthebiologicalliteratureongenes
AT nenadicgoran pubmed2ensemblaresourceforminingthebiologicalliteratureongenes
AT bergmancaseym pubmed2ensemblaresourceforminingthebiologicalliteratureongenes