Cargando…
Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
BACKGROUND: The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, pu...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2768719/ https://www.ncbi.nlm.nih.gov/pubmed/19814812 http://dx.doi.org/10.1186/1471-2105-10-326 |
_version_ | 1782173498060308480 |
---|---|
author | Wiegers, Thomas C Davis, Allan Peter Cohen, K Bretonnel Hirschman, Lynette Mattingly, Carolyn J |
author_facet | Wiegers, Thomas C Davis, Allan Peter Cohen, K Bretonnel Hirschman, Lynette Mattingly, Carolyn J |
author_sort | Wiegers, Thomas C |
collection | PubMed |
description | BACKGROUND: The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, published literature. The goals of the research reported here were to establish a baseline analysis of current CTD curation, develop a text-mining prototype from readily available open source components, and evaluate its potential value in augmenting curation efficiency and increasing data coverage. RESULTS: Prototype text-mining applications were developed and evaluated using a CTD data set consisting of manually curated molecular interactions and relationships from 1,600 documents. Preliminary results indicated that the prototype found 80% of the gene, chemical, and disease terms appearing in curated interactions. These terms were used to re-rank documents for curation, resulting in increases in mean average precision (63% for the baseline vs. 73% for a rule-based re-ranking), and in the correlation coefficient of rank vs. number of curatable interactions per document (baseline 0.14 vs. 0.38 for the rule-based re-ranking). CONCLUSION: This text-mining project is unique in its integration of existing tools into a single workflow with direct application to CTD. We performed a baseline assessment of the inter-curator consistency and coverage in CTD, which allowed us to measure the potential of these integrated tools to improve prioritization of journal articles for manual curation. Our study presents a feasible and cost-effective approach for developing a text mining solution to enhance manual curation throughput and efficiency. |
format | Text |
id | pubmed-2768719 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-27687192009-10-28 Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) Wiegers, Thomas C Davis, Allan Peter Cohen, K Bretonnel Hirschman, Lynette Mattingly, Carolyn J BMC Bioinformatics Research Article BACKGROUND: The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, published literature. The goals of the research reported here were to establish a baseline analysis of current CTD curation, develop a text-mining prototype from readily available open source components, and evaluate its potential value in augmenting curation efficiency and increasing data coverage. RESULTS: Prototype text-mining applications were developed and evaluated using a CTD data set consisting of manually curated molecular interactions and relationships from 1,600 documents. Preliminary results indicated that the prototype found 80% of the gene, chemical, and disease terms appearing in curated interactions. These terms were used to re-rank documents for curation, resulting in increases in mean average precision (63% for the baseline vs. 73% for a rule-based re-ranking), and in the correlation coefficient of rank vs. number of curatable interactions per document (baseline 0.14 vs. 0.38 for the rule-based re-ranking). CONCLUSION: This text-mining project is unique in its integration of existing tools into a single workflow with direct application to CTD. We performed a baseline assessment of the inter-curator consistency and coverage in CTD, which allowed us to measure the potential of these integrated tools to improve prioritization of journal articles for manual curation. Our study presents a feasible and cost-effective approach for developing a text mining solution to enhance manual curation throughput and efficiency. BioMed Central 2009-10-08 /pmc/articles/PMC2768719/ /pubmed/19814812 http://dx.doi.org/10.1186/1471-2105-10-326 Text en Copyright © 2009 Wiegers et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Wiegers, Thomas C Davis, Allan Peter Cohen, K Bretonnel Hirschman, Lynette Mattingly, Carolyn J Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) |
title | Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) |
title_full | Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) |
title_fullStr | Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) |
title_full_unstemmed | Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) |
title_short | Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) |
title_sort | text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (ctd) |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2768719/ https://www.ncbi.nlm.nih.gov/pubmed/19814812 http://dx.doi.org/10.1186/1471-2105-10-326 |
work_keys_str_mv | AT wiegersthomasc textminingandmanualcurationofchemicalgenediseasenetworksforthecomparativetoxicogenomicsdatabasectd AT davisallanpeter textminingandmanualcurationofchemicalgenediseasenetworksforthecomparativetoxicogenomicsdatabasectd AT cohenkbretonnel textminingandmanualcurationofchemicalgenediseasenetworksforthecomparativetoxicogenomicsdatabasectd AT hirschmanlynette textminingandmanualcurationofchemicalgenediseasenetworksforthecomparativetoxicogenomicsdatabasectd AT mattinglycarolynj textminingandmanualcurationofchemicalgenediseasenetworksforthecomparativetoxicogenomicsdatabasectd |