Cargando…

A Bayesian method for comparing and combining binary classifiers in the absence of a gold standard

BACKGROUND: Many problems in bioinformatics involve classification based on features such as sequence, structure or morphology. Given multiple classifiers, two crucial questions arise: how does their performance compare, and how can they best be combined to produce a better classifier? A classifier...

Descripción completa

Detalles Bibliográficos
Autores principales:	Keith, Jonathan M, Davey, Christian M, Boyd, Sarah E
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3473310/ https://www.ncbi.nlm.nih.gov/pubmed/22838505 http://dx.doi.org/10.1186/1471-2105-13-179

_version_	1782246745597542400
author	Keith, Jonathan M Davey, Christian M Boyd, Sarah E
author_facet	Keith, Jonathan M Davey, Christian M Boyd, Sarah E
author_sort	Keith, Jonathan M
collection	PubMed
description	BACKGROUND: Many problems in bioinformatics involve classification based on features such as sequence, structure or morphology. Given multiple classifiers, two crucial questions arise: how does their performance compare, and how can they best be combined to produce a better classifier? A classifier can be evaluated in terms of sensitivity and specificity using benchmark, or gold standard, data, that is, data for which the true classification is known. However, a gold standard is not always available. Here we demonstrate that a Bayesian model for comparing medical diagnostics without a gold standard can be successfully applied in the bioinformatics domain, to genomic scale data sets. We present a new implementation, which unlike previous implementations is applicable to any number of classifiers. We apply this model, for the first time, to the problem of finding the globally optimal logical combination of classifiers. RESULTS: We compared three classifiers of protein subcellular localisation, and evaluated our estimates of sensitivity and specificity against estimates obtained using a gold standard. The method overestimated sensitivity and specificity with only a small discrepancy, and correctly ranked the classifiers. Diagnostic tests for swine flu were then compared on a small data set. Lastly, classifiers for a genome-wide association study of macular degeneration with 541094 SNPs were analysed. In all cases, run times were feasible, and results precise. The optimal logical combination of classifiers was also determined for all three data sets. Code and data are available from http://bioinformatics.monash.edu.au/downloads/. CONCLUSIONS: The examples demonstrate the methods are suitable for both small and large data sets, applicable to the wide range of bioinformatics classification problems, and robust to dependence between classifiers. In all three test cases, the globally optimal logical combination of the classifiers was found to be their union, according to three out of four ranking criteria. We propose as a general rule of thumb that the union of classifiers will be close to optimal.
format	Online Article Text
id	pubmed-3473310
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-34733102012-10-23 A Bayesian method for comparing and combining binary classifiers in the absence of a gold standard Keith, Jonathan M Davey, Christian M Boyd, Sarah E BMC Bioinformatics Methodology Article BACKGROUND: Many problems in bioinformatics involve classification based on features such as sequence, structure or morphology. Given multiple classifiers, two crucial questions arise: how does their performance compare, and how can they best be combined to produce a better classifier? A classifier can be evaluated in terms of sensitivity and specificity using benchmark, or gold standard, data, that is, data for which the true classification is known. However, a gold standard is not always available. Here we demonstrate that a Bayesian model for comparing medical diagnostics without a gold standard can be successfully applied in the bioinformatics domain, to genomic scale data sets. We present a new implementation, which unlike previous implementations is applicable to any number of classifiers. We apply this model, for the first time, to the problem of finding the globally optimal logical combination of classifiers. RESULTS: We compared three classifiers of protein subcellular localisation, and evaluated our estimates of sensitivity and specificity against estimates obtained using a gold standard. The method overestimated sensitivity and specificity with only a small discrepancy, and correctly ranked the classifiers. Diagnostic tests for swine flu were then compared on a small data set. Lastly, classifiers for a genome-wide association study of macular degeneration with 541094 SNPs were analysed. In all cases, run times were feasible, and results precise. The optimal logical combination of classifiers was also determined for all three data sets. Code and data are available from http://bioinformatics.monash.edu.au/downloads/. CONCLUSIONS: The examples demonstrate the methods are suitable for both small and large data sets, applicable to the wide range of bioinformatics classification problems, and robust to dependence between classifiers. In all three test cases, the globally optimal logical combination of the classifiers was found to be their union, according to three out of four ranking criteria. We propose as a general rule of thumb that the union of classifiers will be close to optimal. BioMed Central 2012-07-27 /pmc/articles/PMC3473310/ /pubmed/22838505 http://dx.doi.org/10.1186/1471-2105-13-179 Text en Copyright ©2012 Keith et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Keith, Jonathan M Davey, Christian M Boyd, Sarah E A Bayesian method for comparing and combining binary classifiers in the absence of a gold standard
title	A Bayesian method for comparing and combining binary classifiers in the absence of a gold standard
title_full	A Bayesian method for comparing and combining binary classifiers in the absence of a gold standard
title_fullStr	A Bayesian method for comparing and combining binary classifiers in the absence of a gold standard
title_full_unstemmed	A Bayesian method for comparing and combining binary classifiers in the absence of a gold standard
title_short	A Bayesian method for comparing and combining binary classifiers in the absence of a gold standard
title_sort	bayesian method for comparing and combining binary classifiers in the absence of a gold standard
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3473310/ https://www.ncbi.nlm.nih.gov/pubmed/22838505 http://dx.doi.org/10.1186/1471-2105-13-179
work_keys_str_mv	AT keithjonathanm abayesianmethodforcomparingandcombiningbinaryclassifiersintheabsenceofagoldstandard AT daveychristianm abayesianmethodforcomparingandcombiningbinaryclassifiersintheabsenceofagoldstandard AT boydsarahe abayesianmethodforcomparingandcombiningbinaryclassifiersintheabsenceofagoldstandard AT keithjonathanm bayesianmethodforcomparingandcombiningbinaryclassifiersintheabsenceofagoldstandard AT daveychristianm bayesianmethodforcomparingandcombiningbinaryclassifiersintheabsenceofagoldstandard AT boydsarahe bayesianmethodforcomparingandcombiningbinaryclassifiersintheabsenceofagoldstandard

A Bayesian method for comparing and combining binary classifiers in the absence of a gold standard

Ejemplares similares