Cargando…

Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database

We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). T...

Descripción completa

Detalles Bibliográficos
Autores principales:	Vishnyakova, Dina, Pasche, Emilie, Ruch, Patrick
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2012
Materias:	Original Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3514750/ https://www.ncbi.nlm.nih.gov/pubmed/23221176 http://dx.doi.org/10.1093/database/bas050

_version_	1782252073354526720
author	Vishnyakova, Dina Pasche, Emilie Ruch, Patrick
author_facet	Vishnyakova, Dina Pasche, Emilie Ruch, Patrick
author_sort	Vishnyakova, Dina
collection	PubMed
description	We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). The task can be basically described as a binary classification task, where a scoring function is used to rank a selected set of articles. Then components of a question-answering system are used to extract CTD-specific annotations from the ranked list of articles. The ranking function is generated using a Support Vector Machine, which combines three main modules: an information retrieval engine for MEDLINE (EAGLi), a gene normalization service (NormaGene) developed for a previous BioCreative campaign and finally, a set of answering components and entity recognizer for diseases and chemicals. The main components of the pipeline are publicly available both as web application and web services. The specific integration performed for the BioCreative competition is available via a web user interface at http://pingu.unige.ch:8080/Toxicat.
format	Online Article Text
id	pubmed-3514750
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-35147502012-12-05 Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database Vishnyakova, Dina Pasche, Emilie Ruch, Patrick Database (Oxford) Original Articles We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). The task can be basically described as a binary classification task, where a scoring function is used to rank a selected set of articles. Then components of a question-answering system are used to extract CTD-specific annotations from the ranked list of articles. The ranking function is generated using a Support Vector Machine, which combines three main modules: an information retrieval engine for MEDLINE (EAGLi), a gene normalization service (NormaGene) developed for a previous BioCreative campaign and finally, a set of answering components and entity recognizer for diseases and chemicals. The main components of the pipeline are publicly available both as web application and web services. The specific integration performed for the BioCreative competition is available via a web user interface at http://pingu.unige.ch:8080/Toxicat. Oxford University Press 2012-12-05 /pmc/articles/PMC3514750/ /pubmed/23221176 http://dx.doi.org/10.1093/database/bas050 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial reuse, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com.
spellingShingle	Original Articles Vishnyakova, Dina Pasche, Emilie Ruch, Patrick Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database
title	Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database
title_full	Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database
title_fullStr	Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database
title_full_unstemmed	Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database
title_short	Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database
title_sort	using binary classification to prioritize and curate articles for the comparative toxicogenomics database
topic	Original Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3514750/ https://www.ncbi.nlm.nih.gov/pubmed/23221176 http://dx.doi.org/10.1093/database/bas050
work_keys_str_mv	AT vishnyakovadina usingbinaryclassificationtoprioritizeandcuratearticlesforthecomparativetoxicogenomicsdatabase AT pascheemilie usingbinaryclassificationtoprioritizeandcuratearticlesforthecomparativetoxicogenomicsdatabase AT ruchpatrick usingbinaryclassificationtoprioritizeandcuratearticlesforthecomparativetoxicogenomicsdatabase

Using binary classification to prioritize and curate articles for the Comparative Toxicogenomics Database

Ejemplares similares