Cargando…

Enhancing navigation in biomedical databases by community voting and database-driven text classification

BACKGROUND: The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a mean...

Descripción completa

Detalles Bibliográficos
Autores principales:	Duchrow, Timo, Shtatland, Timur, Guettler, Daniel, Pivovarov, Misha, Kramer, Stefan, Weissleder, Ralph
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2768718/ https://www.ncbi.nlm.nih.gov/pubmed/19799796 http://dx.doi.org/10.1186/1471-2105-10-317

_version_	1782173497820184576
author	Duchrow, Timo Shtatland, Timur Guettler, Daniel Pivovarov, Misha Kramer, Stefan Weissleder, Ralph
author_facet	Duchrow, Timo Shtatland, Timur Guettler, Daniel Pivovarov, Misha Kramer, Stefan Weissleder, Ralph
author_sort	Duchrow, Timo
collection	PubMed
description	BACKGROUND: The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. RESULTS: Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. CONCLUSION: Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases. The system can be accessed at .
format	Text
id	pubmed-2768718
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27687182009-10-28 Enhancing navigation in biomedical databases by community voting and database-driven text classification Duchrow, Timo Shtatland, Timur Guettler, Daniel Pivovarov, Misha Kramer, Stefan Weissleder, Ralph BMC Bioinformatics Research Article BACKGROUND: The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. RESULTS: Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. CONCLUSION: Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases. The system can be accessed at . BioMed Central 2009-10-03 /pmc/articles/PMC2768718/ /pubmed/19799796 http://dx.doi.org/10.1186/1471-2105-10-317 Text en Copyright © 2009 Duchrow et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Duchrow, Timo Shtatland, Timur Guettler, Daniel Pivovarov, Misha Kramer, Stefan Weissleder, Ralph Enhancing navigation in biomedical databases by community voting and database-driven text classification
title	Enhancing navigation in biomedical databases by community voting and database-driven text classification
title_full	Enhancing navigation in biomedical databases by community voting and database-driven text classification
title_fullStr	Enhancing navigation in biomedical databases by community voting and database-driven text classification
title_full_unstemmed	Enhancing navigation in biomedical databases by community voting and database-driven text classification
title_short	Enhancing navigation in biomedical databases by community voting and database-driven text classification
title_sort	enhancing navigation in biomedical databases by community voting and database-driven text classification
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2768718/ https://www.ncbi.nlm.nih.gov/pubmed/19799796 http://dx.doi.org/10.1186/1471-2105-10-317
work_keys_str_mv	AT duchrowtimo enhancingnavigationinbiomedicaldatabasesbycommunityvotinganddatabasedriventextclassification AT shtatlandtimur enhancingnavigationinbiomedicaldatabasesbycommunityvotinganddatabasedriventextclassification AT guettlerdaniel enhancingnavigationinbiomedicaldatabasesbycommunityvotinganddatabasedriventextclassification AT pivovarovmisha enhancingnavigationinbiomedicaldatabasesbycommunityvotinganddatabasedriventextclassification AT kramerstefan enhancingnavigationinbiomedicaldatabasesbycommunityvotinganddatabasedriventextclassification AT weisslederralph enhancingnavigationinbiomedicaldatabasesbycommunityvotinganddatabasedriventextclassification

Enhancing navigation in biomedical databases by community voting and database-driven text classification

Ejemplares similares