Cargando…
pubmed2ensembl: A Resource for Mining the Biological Literature on Genes
BACKGROUND: The last two decades have witnessed a dramatic acceleration in the production of genomic sequence information and publication of biomedical articles. Despite the fact that genome sequence data and publications are two of the most heavily relied-upon sources of information for many biolog...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3183000/ https://www.ncbi.nlm.nih.gov/pubmed/21980353 http://dx.doi.org/10.1371/journal.pone.0024716 |
_version_ | 1782212964635377664 |
---|---|
author | Baran, Joachim Gerner, Martin Haeussler, Maximilian Nenadic, Goran Bergman, Casey M. |
author_facet | Baran, Joachim Gerner, Martin Haeussler, Maximilian Nenadic, Goran Bergman, Casey M. |
author_sort | Baran, Joachim |
collection | PubMed |
description | BACKGROUND: The last two decades have witnessed a dramatic acceleration in the production of genomic sequence information and publication of biomedical articles. Despite the fact that genome sequence data and publications are two of the most heavily relied-upon sources of information for many biologists, very little effort has been made to systematically integrate data from genomic sequences directly with the biological literature. For a limited number of model organisms dedicated teams manually curate publications about genes; however for species with no such dedicated staff many thousands of articles are never mapped to genes or genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: To overcome the lack of integration between genomic data and biological literature, we have developed pubmed2ensembl (http://www.pubmed2ensembl.org), an extension to the BioMart system that links over 2,000,000 articles in PubMed to nearly 150,000 genes in Ensembl from 50 species. We use several sources of curated (e.g., Entrez Gene) and automatically generated (e.g., gene names extracted through text-mining on MEDLINE records) sources of gene-publication links, allowing users to filter and combine different data sources to suit their individual needs for information extraction and biological discovery. In addition to extending the Ensembl BioMart database to include published information on genes, we also implemented a scripting language for automated BioMart construction and a novel BioMart interface that allows text-based queries to be performed against PubMed and PubMed Central documents in conjunction with constraints on genomic features. Finally, we illustrate the potential of pubmed2ensembl through typical use cases that involve integrated queries across the biomedical literature and genomic data. CONCLUSION/SIGNIFICANCE: By allowing biologists to find the relevant literature on specific genomic regions or sets of functionally related genes more easily, pubmed2ensembl offers a much-needed genome informatics inspired solution to accessing the ever-increasing biomedical literature. |
format | Online Article Text |
id | pubmed-3183000 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-31830002011-10-06 pubmed2ensembl: A Resource for Mining the Biological Literature on Genes Baran, Joachim Gerner, Martin Haeussler, Maximilian Nenadic, Goran Bergman, Casey M. PLoS One Research Article BACKGROUND: The last two decades have witnessed a dramatic acceleration in the production of genomic sequence information and publication of biomedical articles. Despite the fact that genome sequence data and publications are two of the most heavily relied-upon sources of information for many biologists, very little effort has been made to systematically integrate data from genomic sequences directly with the biological literature. For a limited number of model organisms dedicated teams manually curate publications about genes; however for species with no such dedicated staff many thousands of articles are never mapped to genes or genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: To overcome the lack of integration between genomic data and biological literature, we have developed pubmed2ensembl (http://www.pubmed2ensembl.org), an extension to the BioMart system that links over 2,000,000 articles in PubMed to nearly 150,000 genes in Ensembl from 50 species. We use several sources of curated (e.g., Entrez Gene) and automatically generated (e.g., gene names extracted through text-mining on MEDLINE records) sources of gene-publication links, allowing users to filter and combine different data sources to suit their individual needs for information extraction and biological discovery. In addition to extending the Ensembl BioMart database to include published information on genes, we also implemented a scripting language for automated BioMart construction and a novel BioMart interface that allows text-based queries to be performed against PubMed and PubMed Central documents in conjunction with constraints on genomic features. Finally, we illustrate the potential of pubmed2ensembl through typical use cases that involve integrated queries across the biomedical literature and genomic data. CONCLUSION/SIGNIFICANCE: By allowing biologists to find the relevant literature on specific genomic regions or sets of functionally related genes more easily, pubmed2ensembl offers a much-needed genome informatics inspired solution to accessing the ever-increasing biomedical literature. Public Library of Science 2011-09-29 /pmc/articles/PMC3183000/ /pubmed/21980353 http://dx.doi.org/10.1371/journal.pone.0024716 Text en Baran et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Baran, Joachim Gerner, Martin Haeussler, Maximilian Nenadic, Goran Bergman, Casey M. pubmed2ensembl: A Resource for Mining the Biological Literature on Genes |
title | pubmed2ensembl: A Resource for Mining the Biological Literature on Genes |
title_full | pubmed2ensembl: A Resource for Mining the Biological Literature on Genes |
title_fullStr | pubmed2ensembl: A Resource for Mining the Biological Literature on Genes |
title_full_unstemmed | pubmed2ensembl: A Resource for Mining the Biological Literature on Genes |
title_short | pubmed2ensembl: A Resource for Mining the Biological Literature on Genes |
title_sort | pubmed2ensembl: a resource for mining the biological literature on genes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3183000/ https://www.ncbi.nlm.nih.gov/pubmed/21980353 http://dx.doi.org/10.1371/journal.pone.0024716 |
work_keys_str_mv | AT baranjoachim pubmed2ensemblaresourceforminingthebiologicalliteratureongenes AT gernermartin pubmed2ensemblaresourceforminingthebiologicalliteratureongenes AT haeusslermaximilian pubmed2ensemblaresourceforminingthebiologicalliteratureongenes AT nenadicgoran pubmed2ensemblaresourceforminingthebiologicalliteratureongenes AT bergmancaseym pubmed2ensemblaresourceforminingthebiologicalliteratureongenes |