Cargando…

Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)

BACKGROUND: The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, pu...

Descripción completa

Detalles Bibliográficos
Autores principales: Wiegers, Thomas C, Davis, Allan Peter, Cohen, K Bretonnel, Hirschman, Lynette, Mattingly, Carolyn J
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2768719/
https://www.ncbi.nlm.nih.gov/pubmed/19814812
http://dx.doi.org/10.1186/1471-2105-10-326
_version_ 1782173498060308480
author Wiegers, Thomas C
Davis, Allan Peter
Cohen, K Bretonnel
Hirschman, Lynette
Mattingly, Carolyn J
author_facet Wiegers, Thomas C
Davis, Allan Peter
Cohen, K Bretonnel
Hirschman, Lynette
Mattingly, Carolyn J
author_sort Wiegers, Thomas C
collection PubMed
description BACKGROUND: The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, published literature. The goals of the research reported here were to establish a baseline analysis of current CTD curation, develop a text-mining prototype from readily available open source components, and evaluate its potential value in augmenting curation efficiency and increasing data coverage. RESULTS: Prototype text-mining applications were developed and evaluated using a CTD data set consisting of manually curated molecular interactions and relationships from 1,600 documents. Preliminary results indicated that the prototype found 80% of the gene, chemical, and disease terms appearing in curated interactions. These terms were used to re-rank documents for curation, resulting in increases in mean average precision (63% for the baseline vs. 73% for a rule-based re-ranking), and in the correlation coefficient of rank vs. number of curatable interactions per document (baseline 0.14 vs. 0.38 for the rule-based re-ranking). CONCLUSION: This text-mining project is unique in its integration of existing tools into a single workflow with direct application to CTD. We performed a baseline assessment of the inter-curator consistency and coverage in CTD, which allowed us to measure the potential of these integrated tools to improve prioritization of journal articles for manual curation. Our study presents a feasible and cost-effective approach for developing a text mining solution to enhance manual curation throughput and efficiency.
format Text
id pubmed-2768719
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27687192009-10-28 Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) Wiegers, Thomas C Davis, Allan Peter Cohen, K Bretonnel Hirschman, Lynette Mattingly, Carolyn J BMC Bioinformatics Research Article BACKGROUND: The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, published literature. The goals of the research reported here were to establish a baseline analysis of current CTD curation, develop a text-mining prototype from readily available open source components, and evaluate its potential value in augmenting curation efficiency and increasing data coverage. RESULTS: Prototype text-mining applications were developed and evaluated using a CTD data set consisting of manually curated molecular interactions and relationships from 1,600 documents. Preliminary results indicated that the prototype found 80% of the gene, chemical, and disease terms appearing in curated interactions. These terms were used to re-rank documents for curation, resulting in increases in mean average precision (63% for the baseline vs. 73% for a rule-based re-ranking), and in the correlation coefficient of rank vs. number of curatable interactions per document (baseline 0.14 vs. 0.38 for the rule-based re-ranking). CONCLUSION: This text-mining project is unique in its integration of existing tools into a single workflow with direct application to CTD. We performed a baseline assessment of the inter-curator consistency and coverage in CTD, which allowed us to measure the potential of these integrated tools to improve prioritization of journal articles for manual curation. Our study presents a feasible and cost-effective approach for developing a text mining solution to enhance manual curation throughput and efficiency. BioMed Central 2009-10-08 /pmc/articles/PMC2768719/ /pubmed/19814812 http://dx.doi.org/10.1186/1471-2105-10-326 Text en Copyright © 2009 Wiegers et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Wiegers, Thomas C
Davis, Allan Peter
Cohen, K Bretonnel
Hirschman, Lynette
Mattingly, Carolyn J
Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
title Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
title_full Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
title_fullStr Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
title_full_unstemmed Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
title_short Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
title_sort text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (ctd)
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2768719/
https://www.ncbi.nlm.nih.gov/pubmed/19814812
http://dx.doi.org/10.1186/1471-2105-10-326
work_keys_str_mv AT wiegersthomasc textminingandmanualcurationofchemicalgenediseasenetworksforthecomparativetoxicogenomicsdatabasectd
AT davisallanpeter textminingandmanualcurationofchemicalgenediseasenetworksforthecomparativetoxicogenomicsdatabasectd
AT cohenkbretonnel textminingandmanualcurationofchemicalgenediseasenetworksforthecomparativetoxicogenomicsdatabasectd
AT hirschmanlynette textminingandmanualcurationofchemicalgenediseasenetworksforthecomparativetoxicogenomicsdatabasectd
AT mattinglycarolynj textminingandmanualcurationofchemicalgenediseasenetworksforthecomparativetoxicogenomicsdatabasectd