Cargando…

Classification of imbalanced oral cancer image data from high-risk population

Significance: Early detection of oral cancer is vital for high-risk patients, and machine learning-based automatic classification is ideal for disease screening. However, current datasets collected from high-risk populations are unbalanced and often have detrimental effects on the performance of cla...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Bofan, Li, Shaobai, Sunny, Sumsum, Gurushanth, Keerthi, Mendonca, Pramila, Mukhia, Nirza, Patrick, Sanjana, Gurudath, Shubha, Raghavan, Subhashini, Tsusennaro, Imchen, Leivon, Shirley T., Kolur, Trupti, Shetty, Vivek, Bushan, Vidya, Ramesh, Rohan, Peterson, Tyler, Pillai, Vijay, Wilder-Smith, Petra, Sigamani, Alben, Suresh, Amritha, Kuriakose, Moni Abraham, Birur, Praveen, Liang, Rongguang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Society of Photo-Optical Instrumentation Engineers 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8536945/
https://www.ncbi.nlm.nih.gov/pubmed/34689442
http://dx.doi.org/10.1117/1.JBO.26.10.105001
_version_ 1784588130857254912
author Song, Bofan
Li, Shaobai
Sunny, Sumsum
Gurushanth, Keerthi
Mendonca, Pramila
Mukhia, Nirza
Patrick, Sanjana
Gurudath, Shubha
Raghavan, Subhashini
Tsusennaro, Imchen
Leivon, Shirley T.
Kolur, Trupti
Shetty, Vivek
Bushan, Vidya
Ramesh, Rohan
Peterson, Tyler
Pillai, Vijay
Wilder-Smith, Petra
Sigamani, Alben
Suresh, Amritha
Kuriakose, Moni Abraham
Birur, Praveen
Liang, Rongguang
author_facet Song, Bofan
Li, Shaobai
Sunny, Sumsum
Gurushanth, Keerthi
Mendonca, Pramila
Mukhia, Nirza
Patrick, Sanjana
Gurudath, Shubha
Raghavan, Subhashini
Tsusennaro, Imchen
Leivon, Shirley T.
Kolur, Trupti
Shetty, Vivek
Bushan, Vidya
Ramesh, Rohan
Peterson, Tyler
Pillai, Vijay
Wilder-Smith, Petra
Sigamani, Alben
Suresh, Amritha
Kuriakose, Moni Abraham
Birur, Praveen
Liang, Rongguang
author_sort Song, Bofan
collection PubMed
description Significance: Early detection of oral cancer is vital for high-risk patients, and machine learning-based automatic classification is ideal for disease screening. However, current datasets collected from high-risk populations are unbalanced and often have detrimental effects on the performance of classification. Aim: To reduce the class bias caused by data imbalance. Approach: We collected 3851 polarized white light cheek mucosa images using our customized oral cancer screening device. We use weight balancing, data augmentation, undersampling, focal loss, and ensemble methods to improve the neural network performance of oral cancer image classification with the imbalanced multi-class datasets captured from high-risk populations during oral cancer screening in low-resource settings. Results: By applying both data-level and algorithm-level approaches to the deep learning training process, the performance of the minority classes, which were difficult to distinguish at the beginning, has been improved. The accuracy of “premalignancy” class is also increased, which is ideal for screening applications. Conclusions: Experimental results show that the class bias induced by imbalanced oral cancer image datasets could be reduced using both data- and algorithm-level methods. Our study may provide an important basis for helping understand the influence of unbalanced datasets on oral cancer deep learning classifiers and how to mitigate.
format Online
Article
Text
id pubmed-8536945
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Society of Photo-Optical Instrumentation Engineers
record_format MEDLINE/PubMed
spelling pubmed-85369452021-10-25 Classification of imbalanced oral cancer image data from high-risk population Song, Bofan Li, Shaobai Sunny, Sumsum Gurushanth, Keerthi Mendonca, Pramila Mukhia, Nirza Patrick, Sanjana Gurudath, Shubha Raghavan, Subhashini Tsusennaro, Imchen Leivon, Shirley T. Kolur, Trupti Shetty, Vivek Bushan, Vidya Ramesh, Rohan Peterson, Tyler Pillai, Vijay Wilder-Smith, Petra Sigamani, Alben Suresh, Amritha Kuriakose, Moni Abraham Birur, Praveen Liang, Rongguang J Biomed Opt General Significance: Early detection of oral cancer is vital for high-risk patients, and machine learning-based automatic classification is ideal for disease screening. However, current datasets collected from high-risk populations are unbalanced and often have detrimental effects on the performance of classification. Aim: To reduce the class bias caused by data imbalance. Approach: We collected 3851 polarized white light cheek mucosa images using our customized oral cancer screening device. We use weight balancing, data augmentation, undersampling, focal loss, and ensemble methods to improve the neural network performance of oral cancer image classification with the imbalanced multi-class datasets captured from high-risk populations during oral cancer screening in low-resource settings. Results: By applying both data-level and algorithm-level approaches to the deep learning training process, the performance of the minority classes, which were difficult to distinguish at the beginning, has been improved. The accuracy of “premalignancy” class is also increased, which is ideal for screening applications. Conclusions: Experimental results show that the class bias induced by imbalanced oral cancer image datasets could be reduced using both data- and algorithm-level methods. Our study may provide an important basis for helping understand the influence of unbalanced datasets on oral cancer deep learning classifiers and how to mitigate. Society of Photo-Optical Instrumentation Engineers 2021-10-23 2021-10 /pmc/articles/PMC8536945/ /pubmed/34689442 http://dx.doi.org/10.1117/1.JBO.26.10.105001 Text en © 2021 The Authors https://creativecommons.org/licenses/by/4.0/Published by SPIE under a Creative Commons Attribution 4.0 International License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
spellingShingle General
Song, Bofan
Li, Shaobai
Sunny, Sumsum
Gurushanth, Keerthi
Mendonca, Pramila
Mukhia, Nirza
Patrick, Sanjana
Gurudath, Shubha
Raghavan, Subhashini
Tsusennaro, Imchen
Leivon, Shirley T.
Kolur, Trupti
Shetty, Vivek
Bushan, Vidya
Ramesh, Rohan
Peterson, Tyler
Pillai, Vijay
Wilder-Smith, Petra
Sigamani, Alben
Suresh, Amritha
Kuriakose, Moni Abraham
Birur, Praveen
Liang, Rongguang
Classification of imbalanced oral cancer image data from high-risk population
title Classification of imbalanced oral cancer image data from high-risk population
title_full Classification of imbalanced oral cancer image data from high-risk population
title_fullStr Classification of imbalanced oral cancer image data from high-risk population
title_full_unstemmed Classification of imbalanced oral cancer image data from high-risk population
title_short Classification of imbalanced oral cancer image data from high-risk population
title_sort classification of imbalanced oral cancer image data from high-risk population
topic General
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8536945/
https://www.ncbi.nlm.nih.gov/pubmed/34689442
http://dx.doi.org/10.1117/1.JBO.26.10.105001
work_keys_str_mv AT songbofan classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT lishaobai classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT sunnysumsum classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT gurushanthkeerthi classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT mendoncapramila classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT mukhianirza classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT patricksanjana classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT gurudathshubha classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT raghavansubhashini classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT tsusennaroimchen classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT leivonshirleyt classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT kolurtrupti classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT shettyvivek classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT bushanvidya classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT rameshrohan classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT petersontyler classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT pillaivijay classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT wildersmithpetra classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT sigamanialben classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT sureshamritha classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT kuriakosemoniabraham classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT birurpraveen classificationofimbalancedoralcancerimagedatafromhighriskpopulation
AT liangrongguang classificationofimbalancedoralcancerimagedatafromhighriskpopulation