Cargando…

Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information

The Comparative Toxicogenomics Database (CTD) contains manually curated literature that describes chemical–gene interactions, chemical–disease relationships and gene–disease relationships. Finding articles containing this information is the first and an important step to assist manual curation effic...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Sun, Kim, Won, Wei, Chih-Hsuan, Lu, Zhiyong, Wilbur, W. John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3500521/
https://www.ncbi.nlm.nih.gov/pubmed/23160415
http://dx.doi.org/10.1093/database/bas042
_version_ 1782250116984340480
author Kim, Sun
Kim, Won
Wei, Chih-Hsuan
Lu, Zhiyong
Wilbur, W. John
author_facet Kim, Sun
Kim, Won
Wei, Chih-Hsuan
Lu, Zhiyong
Wilbur, W. John
author_sort Kim, Sun
collection PubMed
description The Comparative Toxicogenomics Database (CTD) contains manually curated literature that describes chemical–gene interactions, chemical–disease relationships and gene–disease relationships. Finding articles containing this information is the first and an important step to assist manual curation efficiency. However, the complex nature of named entities and their relationships make it challenging to choose relevant articles. In this article, we introduce a machine learning framework for prioritizing CTD-relevant articles based on our prior system for the protein–protein interaction article classification task in BioCreative III. To address new challenges in the CTD task, we explore a new entity identification method for genes, chemicals and diseases. In addition, latent topics are analyzed and used as a feature type to overcome the small size of the training set. Applied to the BioCreative 2012 Triage dataset, our method achieved 0.8030 mean average precision (MAP) in the official runs, resulting in the top MAP system among participants. Integrated with PubTator, a Web interface for annotating biomedical literature, the proposed system also received a positive review from the CTD curation team.
format Online
Article
Text
id pubmed-3500521
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-35005212012-11-19 Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information Kim, Sun Kim, Won Wei, Chih-Hsuan Lu, Zhiyong Wilbur, W. John Database (Oxford) BioCreative Virtual Issue The Comparative Toxicogenomics Database (CTD) contains manually curated literature that describes chemical–gene interactions, chemical–disease relationships and gene–disease relationships. Finding articles containing this information is the first and an important step to assist manual curation efficiency. However, the complex nature of named entities and their relationships make it challenging to choose relevant articles. In this article, we introduce a machine learning framework for prioritizing CTD-relevant articles based on our prior system for the protein–protein interaction article classification task in BioCreative III. To address new challenges in the CTD task, we explore a new entity identification method for genes, chemicals and diseases. In addition, latent topics are analyzed and used as a feature type to overcome the small size of the training set. Applied to the BioCreative 2012 Triage dataset, our method achieved 0.8030 mean average precision (MAP) in the official runs, resulting in the top MAP system among participants. Integrated with PubTator, a Web interface for annotating biomedical literature, the proposed system also received a positive review from the CTD curation team. Oxford University Press 2012-11-15 /pmc/articles/PMC3500521/ /pubmed/23160415 http://dx.doi.org/10.1093/database/bas042 Text en Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com.
spellingShingle BioCreative Virtual Issue
Kim, Sun
Kim, Won
Wei, Chih-Hsuan
Lu, Zhiyong
Wilbur, W. John
Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information
title Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information
title_full Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information
title_fullStr Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information
title_full_unstemmed Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information
title_short Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information
title_sort prioritizing pubmed articles for the comparative toxicogenomic database utilizing semantic information
topic BioCreative Virtual Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3500521/
https://www.ncbi.nlm.nih.gov/pubmed/23160415
http://dx.doi.org/10.1093/database/bas042
work_keys_str_mv AT kimsun prioritizingpubmedarticlesforthecomparativetoxicogenomicdatabaseutilizingsemanticinformation
AT kimwon prioritizingpubmedarticlesforthecomparativetoxicogenomicdatabaseutilizingsemanticinformation
AT weichihhsuan prioritizingpubmedarticlesforthecomparativetoxicogenomicdatabaseutilizingsemanticinformation
AT luzhiyong prioritizingpubmedarticlesforthecomparativetoxicogenomicdatabaseutilizingsemanticinformation
AT wilburwjohn prioritizingpubmedarticlesforthecomparativetoxicogenomicdatabaseutilizingsemanticinformation