Cargando…

Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)

BACKGROUND: The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, pu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wiegers, Thomas C, Davis, Allan Peter, Cohen, K Bretonnel, Hirschman, Lynette, Mattingly, Carolyn J
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2768719/ https://www.ncbi.nlm.nih.gov/pubmed/19814812 http://dx.doi.org/10.1186/1471-2105-10-326

_version_	1782173498060308480
author	Wiegers, Thomas C Davis, Allan Peter Cohen, K Bretonnel Hirschman, Lynette Mattingly, Carolyn J
author_facet	Wiegers, Thomas C Davis, Allan Peter Cohen, K Bretonnel Hirschman, Lynette Mattingly, Carolyn J
author_sort	Wiegers, Thomas C
collection	PubMed
description	BACKGROUND: The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, published literature. The goals of the research reported here were to establish a baseline analysis of current CTD curation, develop a text-mining prototype from readily available open source components, and evaluate its potential value in augmenting curation efficiency and increasing data coverage. RESULTS: Prototype text-mining applications were developed and evaluated using a CTD data set consisting of manually curated molecular interactions and relationships from 1,600 documents. Preliminary results indicated that the prototype found 80% of the gene, chemical, and disease terms appearing in curated interactions. These terms were used to re-rank documents for curation, resulting in increases in mean average precision (63% for the baseline vs. 73% for a rule-based re-ranking), and in the correlation coefficient of rank vs. number of curatable interactions per document (baseline 0.14 vs. 0.38 for the rule-based re-ranking). CONCLUSION: This text-mining project is unique in its integration of existing tools into a single workflow with direct application to CTD. We performed a baseline assessment of the inter-curator consistency and coverage in CTD, which allowed us to measure the potential of these integrated tools to improve prioritization of journal articles for manual curation. Our study presents a feasible and cost-effective approach for developing a text mining solution to enhance manual curation throughput and efficiency.
format	Text
id	pubmed-2768719
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27687192009-10-28 Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) Wiegers, Thomas C Davis, Allan Peter Cohen, K Bretonnel Hirschman, Lynette Mattingly, Carolyn J BMC Bioinformatics Research Article BACKGROUND: The Comparative Toxicogenomics Database (CTD) is a publicly available resource that promotes understanding about the etiology of environmental diseases. It provides manually curated chemical-gene/protein interactions and chemical- and gene-disease relationships from the peer-reviewed, published literature. The goals of the research reported here were to establish a baseline analysis of current CTD curation, develop a text-mining prototype from readily available open source components, and evaluate its potential value in augmenting curation efficiency and increasing data coverage. RESULTS: Prototype text-mining applications were developed and evaluated using a CTD data set consisting of manually curated molecular interactions and relationships from 1,600 documents. Preliminary results indicated that the prototype found 80% of the gene, chemical, and disease terms appearing in curated interactions. These terms were used to re-rank documents for curation, resulting in increases in mean average precision (63% for the baseline vs. 73% for a rule-based re-ranking), and in the correlation coefficient of rank vs. number of curatable interactions per document (baseline 0.14 vs. 0.38 for the rule-based re-ranking). CONCLUSION: This text-mining project is unique in its integration of existing tools into a single workflow with direct application to CTD. We performed a baseline assessment of the inter-curator consistency and coverage in CTD, which allowed us to measure the potential of these integrated tools to improve prioritization of journal articles for manual curation. Our study presents a feasible and cost-effective approach for developing a text mining solution to enhance manual curation throughput and efficiency. BioMed Central 2009-10-08 /pmc/articles/PMC2768719/ /pubmed/19814812 http://dx.doi.org/10.1186/1471-2105-10-326 Text en Copyright © 2009 Wiegers et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Wiegers, Thomas C Davis, Allan Peter Cohen, K Bretonnel Hirschman, Lynette Mattingly, Carolyn J Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
title	Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
title_full	Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
title_fullStr	Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
title_full_unstemmed	Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
title_short	Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)
title_sort	text mining and manual curation of chemical-gene-disease networks for the comparative toxicogenomics database (ctd)
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2768719/ https://www.ncbi.nlm.nih.gov/pubmed/19814812 http://dx.doi.org/10.1186/1471-2105-10-326
work_keys_str_mv	AT wiegersthomasc textminingandmanualcurationofchemicalgenediseasenetworksforthecomparativetoxicogenomicsdatabasectd AT davisallanpeter textminingandmanualcurationofchemicalgenediseasenetworksforthecomparativetoxicogenomicsdatabasectd AT cohenkbretonnel textminingandmanualcurationofchemicalgenediseasenetworksforthecomparativetoxicogenomicsdatabasectd AT hirschmanlynette textminingandmanualcurationofchemicalgenediseasenetworksforthecomparativetoxicogenomicsdatabasectd AT mattinglycarolynj textminingandmanualcurationofchemicalgenediseasenetworksforthecomparativetoxicogenomicsdatabasectd

Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD)

Ejemplares similares