Cargando…

Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database

We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). T...

Descripción completa

Detalles Bibliográficos
Autores principales: Vishnyakova, Dina, Pasche, Emilie, Ruch, Patrick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3514750/
https://www.ncbi.nlm.nih.gov/pubmed/23221176
http://dx.doi.org/10.1093/database/bas050
_version_ 1782252073354526720
author Vishnyakova, Dina
Pasche, Emilie
Ruch, Patrick
author_facet Vishnyakova, Dina
Pasche, Emilie
Ruch, Patrick
author_sort Vishnyakova, Dina
collection PubMed
description We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). The task can be basically described as a binary classification task, where a scoring function is used to rank a selected set of articles. Then components of a question-answering system are used to extract CTD-specific annotations from the ranked list of articles. The ranking function is generated using a Support Vector Machine, which combines three main modules: an information retrieval engine for MEDLINE (EAGLi), a gene normalization service (NormaGene) developed for a previous BioCreative campaign and finally, a set of answering components and entity recognizer for diseases and chemicals. The main components of the pipeline are publicly available both as web application and web services. The specific integration performed for the BioCreative competition is available via a web user interface at http://pingu.unige.ch:8080/Toxicat.
format Online
Article
Text
id pubmed-3514750
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-35147502012-12-05 Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database Vishnyakova, Dina Pasche, Emilie Ruch, Patrick Database (Oxford) Original Articles We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). The task can be basically described as a binary classification task, where a scoring function is used to rank a selected set of articles. Then components of a question-answering system are used to extract CTD-specific annotations from the ranked list of articles. The ranking function is generated using a Support Vector Machine, which combines three main modules: an information retrieval engine for MEDLINE (EAGLi), a gene normalization service (NormaGene) developed for a previous BioCreative campaign and finally, a set of answering components and entity recognizer for diseases and chemicals. The main components of the pipeline are publicly available both as web application and web services. The specific integration performed for the BioCreative competition is available via a web user interface at http://pingu.unige.ch:8080/Toxicat. Oxford University Press 2012-12-05 /pmc/articles/PMC3514750/ /pubmed/23221176 http://dx.doi.org/10.1093/database/bas050 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com.
spellingShingle Original Articles
Vishnyakova, Dina
Pasche, Emilie
Ruch, Patrick
Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database
title Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database
title_full Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database
title_fullStr Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database
title_full_unstemmed Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database
title_short Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database
title_sort using binary classification to prioritize and curate articles for the comparative toxicogenomics database
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3514750/
https://www.ncbi.nlm.nih.gov/pubmed/23221176
http://dx.doi.org/10.1093/database/bas050
work_keys_str_mv AT vishnyakovadina usingbinaryclassificationtoprioritizeandcuratearticlesforthecomparativetoxicogenomicsdatabase
AT pascheemilie usingbinaryclassificationtoprioritizeandcuratearticlesforthecomparativetoxicogenomicsdatabase
AT ruchpatrick usingbinaryclassificationtoprioritizeandcuratearticlesforthecomparativetoxicogenomicsdatabase