Cargando…
Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information
The Comparative Toxicogenomics Database (CTD) contains manually curated literature that describes chemical–gene interactions, chemical–disease relationships and gene–disease relationships. Finding articles containing this information is the first and an important step to assist manual curation effic...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3500521/ https://www.ncbi.nlm.nih.gov/pubmed/23160415 http://dx.doi.org/10.1093/database/bas042 |
_version_ | 1782250116984340480 |
---|---|
author | Kim, Sun Kim, Won Wei, Chih-Hsuan Lu, Zhiyong Wilbur, W. John |
author_facet | Kim, Sun Kim, Won Wei, Chih-Hsuan Lu, Zhiyong Wilbur, W. John |
author_sort | Kim, Sun |
collection | PubMed |
description | The Comparative Toxicogenomics Database (CTD) contains manually curated literature that describes chemical–gene interactions, chemical–disease relationships and gene–disease relationships. Finding articles containing this information is the first and an important step to assist manual curation efficiency. However, the complex nature of named entities and their relationships make it challenging to choose relevant articles. In this article, we introduce a machine learning framework for prioritizing CTD-relevant articles based on our prior system for the protein–protein interaction article classification task in BioCreative III. To address new challenges in the CTD task, we explore a new entity identification method for genes, chemicals and diseases. In addition, latent topics are analyzed and used as a feature type to overcome the small size of the training set. Applied to the BioCreative 2012 Triage dataset, our method achieved 0.8030 mean average precision (MAP) in the official runs, resulting in the top MAP system among participants. Integrated with PubTator, a Web interface for annotating biomedical literature, the proposed system also received a positive review from the CTD curation team. |
format | Online Article Text |
id | pubmed-3500521 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-35005212012-11-19 Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information Kim, Sun Kim, Won Wei, Chih-Hsuan Lu, Zhiyong Wilbur, W. John Database (Oxford) BioCreative Virtual Issue The Comparative Toxicogenomics Database (CTD) contains manually curated literature that describes chemical–gene interactions, chemical–disease relationships and gene–disease relationships. Finding articles containing this information is the first and an important step to assist manual curation efficiency. However, the complex nature of named entities and their relationships make it challenging to choose relevant articles. In this article, we introduce a machine learning framework for prioritizing CTD-relevant articles based on our prior system for the protein–protein interaction article classification task in BioCreative III. To address new challenges in the CTD task, we explore a new entity identification method for genes, chemicals and diseases. In addition, latent topics are analyzed and used as a feature type to overcome the small size of the training set. Applied to the BioCreative 2012 Triage dataset, our method achieved 0.8030 mean average precision (MAP) in the official runs, resulting in the top MAP system among participants. Integrated with PubTator, a Web interface for annotating biomedical literature, the proposed system also received a positive review from the CTD curation team. Oxford University Press 2012-11-15 /pmc/articles/PMC3500521/ /pubmed/23160415 http://dx.doi.org/10.1093/database/bas042 Text en Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com. |
spellingShingle | BioCreative Virtual Issue Kim, Sun Kim, Won Wei, Chih-Hsuan Lu, Zhiyong Wilbur, W. John Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information |
title | Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information |
title_full | Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information |
title_fullStr | Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information |
title_full_unstemmed | Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information |
title_short | Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information |
title_sort | prioritizing pubmed articles for the comparative toxicogenomic database utilizing semantic information |
topic | BioCreative Virtual Issue |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3500521/ https://www.ncbi.nlm.nih.gov/pubmed/23160415 http://dx.doi.org/10.1093/database/bas042 |
work_keys_str_mv | AT kimsun prioritizingpubmedarticlesforthecomparativetoxicogenomicdatabaseutilizingsemanticinformation AT kimwon prioritizingpubmedarticlesforthecomparativetoxicogenomicdatabaseutilizingsemanticinformation AT weichihhsuan prioritizingpubmedarticlesforthecomparativetoxicogenomicdatabaseutilizingsemanticinformation AT luzhiyong prioritizingpubmedarticlesforthecomparativetoxicogenomicdatabaseutilizingsemanticinformation AT wilburwjohn prioritizingpubmedarticlesforthecomparativetoxicogenomicdatabaseutilizingsemanticinformation |