Cargando…

An active learning based classification strategy for the minority class problem: application to histopathology annotation

BACKGROUND: Supervised classifiers for digital pathology can improve the ability of physicians to detect and diagnose diseases such as cancer. Generating training data for classifiers is problematic, since only domain experts (e.g. pathologists) can correctly label ground truth data. Additionally, d...

Descripción completa

Detalles Bibliográficos
Autores principales: Doyle, Scott, Monaco, James, Feldman, Michael, Tomaszewski, John, Madabhushi, Anant
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3284114/
https://www.ncbi.nlm.nih.gov/pubmed/22034914
http://dx.doi.org/10.1186/1471-2105-12-424
_version_ 1782224323077996544
author Doyle, Scott
Monaco, James
Feldman, Michael
Tomaszewski, John
Madabhushi, Anant
author_facet Doyle, Scott
Monaco, James
Feldman, Michael
Tomaszewski, John
Madabhushi, Anant
author_sort Doyle, Scott
collection PubMed
description BACKGROUND: Supervised classifiers for digital pathology can improve the ability of physicians to detect and diagnose diseases such as cancer. Generating training data for classifiers is problematic, since only domain experts (e.g. pathologists) can correctly label ground truth data. Additionally, digital pathology datasets suffer from the "minority class problem", an issue where the number of exemplars from the non-target class outnumber target class exemplars which can bias the classifier and reduce accuracy. In this paper, we develop a training strategy combining active learning (AL) with class-balancing. AL identifies unlabeled samples that are "informative" (i.e. likely to increase classifier performance) for annotation, avoiding non-informative samples. This yields high accuracy with a smaller training set size compared with random learning (RL). Previous AL methods have not explicitly accounted for the minority class problem in biomedical images. Pre-specifying a target class ratio mitigates the problem of training bias. Finally, we develop a mathematical model to predict the number of annotations (cost) required to achieve balanced training classes. In addition to predicting training cost, the model reveals the theoretical properties of AL in the context of the minority class problem. RESULTS: Using this class-balanced AL training strategy (CBAL), we build a classifier to distinguish cancer from non-cancer regions on digitized prostate histopathology. Our dataset consists of 12,000 image regions sampled from 100 biopsies (58 prostate cancer patients). We compare CBAL against: (1) unbalanced AL (UBAL), which uses AL but ignores class ratio; (2) class-balanced RL (CBRL), which uses RL with a specific class ratio; and (3) unbalanced RL (UBRL). The CBAL-trained classifier yields 2% greater accuracy and 3% higher area under the receiver operating characteristic curve (AUC) than alternatively-trained classifiers. Our cost model accurately predicts the number of annotations necessary to obtain balanced classes. The accuracy of our prediction is verified by empirically-observed costs. Finally, we find that over-sampling the minority class yields a marginal improvement in classifier accuracy but the improved performance comes at the expense of greater annotation cost. CONCLUSIONS: We have combined AL with class balancing to yield a general training strategy applicable to most supervised classification problems where the dataset is expensive to obtain and which suffers from the minority class problem. An intelligent training strategy is a critical component of supervised classification, but the integration of AL and intelligent choice of class ratios, as well as the application of a general cost model, will help researchers to plan the training process more quickly and effectively.
format Online
Article
Text
id pubmed-3284114
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32841142012-02-23 An active learning based classification strategy for the minority class problem: application to histopathology annotation Doyle, Scott Monaco, James Feldman, Michael Tomaszewski, John Madabhushi, Anant BMC Bioinformatics Research Article BACKGROUND: Supervised classifiers for digital pathology can improve the ability of physicians to detect and diagnose diseases such as cancer. Generating training data for classifiers is problematic, since only domain experts (e.g. pathologists) can correctly label ground truth data. Additionally, digital pathology datasets suffer from the "minority class problem", an issue where the number of exemplars from the non-target class outnumber target class exemplars which can bias the classifier and reduce accuracy. In this paper, we develop a training strategy combining active learning (AL) with class-balancing. AL identifies unlabeled samples that are "informative" (i.e. likely to increase classifier performance) for annotation, avoiding non-informative samples. This yields high accuracy with a smaller training set size compared with random learning (RL). Previous AL methods have not explicitly accounted for the minority class problem in biomedical images. Pre-specifying a target class ratio mitigates the problem of training bias. Finally, we develop a mathematical model to predict the number of annotations (cost) required to achieve balanced training classes. In addition to predicting training cost, the model reveals the theoretical properties of AL in the context of the minority class problem. RESULTS: Using this class-balanced AL training strategy (CBAL), we build a classifier to distinguish cancer from non-cancer regions on digitized prostate histopathology. Our dataset consists of 12,000 image regions sampled from 100 biopsies (58 prostate cancer patients). We compare CBAL against: (1) unbalanced AL (UBAL), which uses AL but ignores class ratio; (2) class-balanced RL (CBRL), which uses RL with a specific class ratio; and (3) unbalanced RL (UBRL). The CBAL-trained classifier yields 2% greater accuracy and 3% higher area under the receiver operating characteristic curve (AUC) than alternatively-trained classifiers. Our cost model accurately predicts the number of annotations necessary to obtain balanced classes. The accuracy of our prediction is verified by empirically-observed costs. Finally, we find that over-sampling the minority class yields a marginal improvement in classifier accuracy but the improved performance comes at the expense of greater annotation cost. CONCLUSIONS: We have combined AL with class balancing to yield a general training strategy applicable to most supervised classification problems where the dataset is expensive to obtain and which suffers from the minority class problem. An intelligent training strategy is a critical component of supervised classification, but the integration of AL and intelligent choice of class ratios, as well as the application of a general cost model, will help researchers to plan the training process more quickly and effectively. BioMed Central 2011-10-28 /pmc/articles/PMC3284114/ /pubmed/22034914 http://dx.doi.org/10.1186/1471-2105-12-424 Text en Copyright ©2011 Doyle et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Doyle, Scott
Monaco, James
Feldman, Michael
Tomaszewski, John
Madabhushi, Anant
An active learning based classification strategy for the minority class problem: application to histopathology annotation
title An active learning based classification strategy for the minority class problem: application to histopathology annotation
title_full An active learning based classification strategy for the minority class problem: application to histopathology annotation
title_fullStr An active learning based classification strategy for the minority class problem: application to histopathology annotation
title_full_unstemmed An active learning based classification strategy for the minority class problem: application to histopathology annotation
title_short An active learning based classification strategy for the minority class problem: application to histopathology annotation
title_sort active learning based classification strategy for the minority class problem: application to histopathology annotation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3284114/
https://www.ncbi.nlm.nih.gov/pubmed/22034914
http://dx.doi.org/10.1186/1471-2105-12-424
work_keys_str_mv AT doylescott anactivelearningbasedclassificationstrategyfortheminorityclassproblemapplicationtohistopathologyannotation
AT monacojames anactivelearningbasedclassificationstrategyfortheminorityclassproblemapplicationtohistopathologyannotation
AT feldmanmichael anactivelearningbasedclassificationstrategyfortheminorityclassproblemapplicationtohistopathologyannotation
AT tomaszewskijohn anactivelearningbasedclassificationstrategyfortheminorityclassproblemapplicationtohistopathologyannotation
AT madabhushianant anactivelearningbasedclassificationstrategyfortheminorityclassproblemapplicationtohistopathologyannotation
AT doylescott activelearningbasedclassificationstrategyfortheminorityclassproblemapplicationtohistopathologyannotation
AT monacojames activelearningbasedclassificationstrategyfortheminorityclassproblemapplicationtohistopathologyannotation
AT feldmanmichael activelearningbasedclassificationstrategyfortheminorityclassproblemapplicationtohistopathologyannotation
AT tomaszewskijohn activelearningbasedclassificationstrategyfortheminorityclassproblemapplicationtohistopathologyannotation
AT madabhushianant activelearningbasedclassificationstrategyfortheminorityclassproblemapplicationtohistopathologyannotation