Cargando…

Mining the Gene Wiki for functional genomic knowledge

BACKGROUND: Ontology-based gene annotations are important tools for organizing and analyzing genome-scale biological data. Collecting these annotations is a valuable but costly endeavor. The Gene Wiki makes use of Wikipedia as a low-cost, mass-collaborative platform for assembling text-based gene an...

Descripción completa

Detalles Bibliográficos
Autores principales: Good, Benjamin M, Howe, Douglas G, Lin, Simon M, Kibbe, Warren A, Su, Andrew I
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3271090/
https://www.ncbi.nlm.nih.gov/pubmed/22165947
http://dx.doi.org/10.1186/1471-2164-12-603
_version_ 1782222653916971008
author Good, Benjamin M
Howe, Douglas G
Lin, Simon M
Kibbe, Warren A
Su, Andrew I
author_facet Good, Benjamin M
Howe, Douglas G
Lin, Simon M
Kibbe, Warren A
Su, Andrew I
author_sort Good, Benjamin M
collection PubMed
description BACKGROUND: Ontology-based gene annotations are important tools for organizing and analyzing genome-scale biological data. Collecting these annotations is a valuable but costly endeavor. The Gene Wiki makes use of Wikipedia as a low-cost, mass-collaborative platform for assembling text-based gene annotations. The Gene Wiki is comprised of more than 10,000 review articles, each describing one human gene. The goal of this study is to define and assess a computational strategy for translating the text of Gene Wiki articles into ontology-based gene annotations. We specifically explore the generation of structured annotations using the Gene Ontology and the Human Disease Ontology. RESULTS: Our system produced 2,983 candidate gene annotations using the Disease Ontology and 11,022 candidate annotations using the Gene Ontology from the text of the Gene Wiki. Based on manual evaluations and comparisons to reference annotation sets, we estimate a precision of 90-93% for the Disease Ontology annotations and 48-64% for the Gene Ontology annotations. We further demonstrate that this data set can systematically improve the results from gene set enrichment analyses. CONCLUSIONS: The Gene Wiki is a rapidly growing corpus of text focused on human gene function. Here, we demonstrate that the Gene Wiki can be a powerful resource for generating ontology-based gene annotations. These annotations can be used immediately to improve workflows for building curated gene annotation databases and knowledge-based statistical analyses.
format Online
Article
Text
id pubmed-3271090
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32710902012-02-08 Mining the Gene Wiki for functional genomic knowledge Good, Benjamin M Howe, Douglas G Lin, Simon M Kibbe, Warren A Su, Andrew I BMC Genomics Research Article BACKGROUND: Ontology-based gene annotations are important tools for organizing and analyzing genome-scale biological data. Collecting these annotations is a valuable but costly endeavor. The Gene Wiki makes use of Wikipedia as a low-cost, mass-collaborative platform for assembling text-based gene annotations. The Gene Wiki is comprised of more than 10,000 review articles, each describing one human gene. The goal of this study is to define and assess a computational strategy for translating the text of Gene Wiki articles into ontology-based gene annotations. We specifically explore the generation of structured annotations using the Gene Ontology and the Human Disease Ontology. RESULTS: Our system produced 2,983 candidate gene annotations using the Disease Ontology and 11,022 candidate annotations using the Gene Ontology from the text of the Gene Wiki. Based on manual evaluations and comparisons to reference annotation sets, we estimate a precision of 90-93% for the Disease Ontology annotations and 48-64% for the Gene Ontology annotations. We further demonstrate that this data set can systematically improve the results from gene set enrichment analyses. CONCLUSIONS: The Gene Wiki is a rapidly growing corpus of text focused on human gene function. Here, we demonstrate that the Gene Wiki can be a powerful resource for generating ontology-based gene annotations. These annotations can be used immediately to improve workflows for building curated gene annotation databases and knowledge-based statistical analyses. BioMed Central 2011-12-13 /pmc/articles/PMC3271090/ /pubmed/22165947 http://dx.doi.org/10.1186/1471-2164-12-603 Text en Copyright ©2011 Good et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Good, Benjamin M
Howe, Douglas G
Lin, Simon M
Kibbe, Warren A
Su, Andrew I
Mining the Gene Wiki for functional genomic knowledge
title Mining the Gene Wiki for functional genomic knowledge
title_full Mining the Gene Wiki for functional genomic knowledge
title_fullStr Mining the Gene Wiki for functional genomic knowledge
title_full_unstemmed Mining the Gene Wiki for functional genomic knowledge
title_short Mining the Gene Wiki for functional genomic knowledge
title_sort mining the gene wiki for functional genomic knowledge
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3271090/
https://www.ncbi.nlm.nih.gov/pubmed/22165947
http://dx.doi.org/10.1186/1471-2164-12-603
work_keys_str_mv AT goodbenjaminm miningthegenewikiforfunctionalgenomicknowledge
AT howedouglasg miningthegenewikiforfunctionalgenomicknowledge
AT linsimonm miningthegenewikiforfunctionalgenomicknowledge
AT kibbewarrena miningthegenewikiforfunctionalgenomicknowledge
AT suandrewi miningthegenewikiforfunctionalgenomicknowledge