Cargando…
Classification of imbalanced oral cancer image data from high-risk population
Significance: Early detection of oral cancer is vital for high-risk patients, and machine learning-based automatic classification is ideal for disease screening. However, current datasets collected from high-risk populations are unbalanced and often have detrimental effects on the performance of cla...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Society of Photo-Optical Instrumentation Engineers
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8536945/ https://www.ncbi.nlm.nih.gov/pubmed/34689442 http://dx.doi.org/10.1117/1.JBO.26.10.105001 |
_version_ | 1784588130857254912 |
---|---|
author | Song, Bofan Li, Shaobai Sunny, Sumsum Gurushanth, Keerthi Mendonca, Pramila Mukhia, Nirza Patrick, Sanjana Gurudath, Shubha Raghavan, Subhashini Tsusennaro, Imchen Leivon, Shirley T. Kolur, Trupti Shetty, Vivek Bushan, Vidya Ramesh, Rohan Peterson, Tyler Pillai, Vijay Wilder-Smith, Petra Sigamani, Alben Suresh, Amritha Kuriakose, Moni Abraham Birur, Praveen Liang, Rongguang |
author_facet | Song, Bofan Li, Shaobai Sunny, Sumsum Gurushanth, Keerthi Mendonca, Pramila Mukhia, Nirza Patrick, Sanjana Gurudath, Shubha Raghavan, Subhashini Tsusennaro, Imchen Leivon, Shirley T. Kolur, Trupti Shetty, Vivek Bushan, Vidya Ramesh, Rohan Peterson, Tyler Pillai, Vijay Wilder-Smith, Petra Sigamani, Alben Suresh, Amritha Kuriakose, Moni Abraham Birur, Praveen Liang, Rongguang |
author_sort | Song, Bofan |
collection | PubMed |
description | Significance: Early detection of oral cancer is vital for high-risk patients, and machine learning-based automatic classification is ideal for disease screening. However, current datasets collected from high-risk populations are unbalanced and often have detrimental effects on the performance of classification. Aim: To reduce the class bias caused by data imbalance. Approach: We collected 3851 polarized white light cheek mucosa images using our customized oral cancer screening device. We use weight balancing, data augmentation, undersampling, focal loss, and ensemble methods to improve the neural network performance of oral cancer image classification with the imbalanced multi-class datasets captured from high-risk populations during oral cancer screening in low-resource settings. Results: By applying both data-level and algorithm-level approaches to the deep learning training process, the performance of the minority classes, which were difficult to distinguish at the beginning, has been improved. The accuracy of “premalignancy” class is also increased, which is ideal for screening applications. Conclusions: Experimental results show that the class bias induced by imbalanced oral cancer image datasets could be reduced using both data- and algorithm-level methods. Our study may provide an important basis for helping understand the influence of unbalanced datasets on oral cancer deep learning classifiers and how to mitigate. |
format | Online Article Text |
id | pubmed-8536945 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Society of Photo-Optical Instrumentation Engineers |
record_format | MEDLINE/PubMed |
spelling | pubmed-85369452021-10-25 Classification of imbalanced oral cancer image data from high-risk population Song, Bofan Li, Shaobai Sunny, Sumsum Gurushanth, Keerthi Mendonca, Pramila Mukhia, Nirza Patrick, Sanjana Gurudath, Shubha Raghavan, Subhashini Tsusennaro, Imchen Leivon, Shirley T. Kolur, Trupti Shetty, Vivek Bushan, Vidya Ramesh, Rohan Peterson, Tyler Pillai, Vijay Wilder-Smith, Petra Sigamani, Alben Suresh, Amritha Kuriakose, Moni Abraham Birur, Praveen Liang, Rongguang J Biomed Opt General Significance: Early detection of oral cancer is vital for high-risk patients, and machine learning-based automatic classification is ideal for disease screening. However, current datasets collected from high-risk populations are unbalanced and often have detrimental effects on the performance of classification. Aim: To reduce the class bias caused by data imbalance. Approach: We collected 3851 polarized white light cheek mucosa images using our customized oral cancer screening device. We use weight balancing, data augmentation, undersampling, focal loss, and ensemble methods to improve the neural network performance of oral cancer image classification with the imbalanced multi-class datasets captured from high-risk populations during oral cancer screening in low-resource settings. Results: By applying both data-level and algorithm-level approaches to the deep learning training process, the performance of the minority classes, which were difficult to distinguish at the beginning, has been improved. The accuracy of “premalignancy” class is also increased, which is ideal for screening applications. Conclusions: Experimental results show that the class bias induced by imbalanced oral cancer image datasets could be reduced using both data- and algorithm-level methods. Our study may provide an important basis for helping understand the influence of unbalanced datasets on oral cancer deep learning classifiers and how to mitigate. Society of Photo-Optical Instrumentation Engineers 2021-10-23 2021-10 /pmc/articles/PMC8536945/ /pubmed/34689442 http://dx.doi.org/10.1117/1.JBO.26.10.105001 Text en © 2021 The Authors https://creativecommons.org/licenses/by/4.0/Published by SPIE under a Creative Commons Attribution 4.0 International License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI. |
spellingShingle | General Song, Bofan Li, Shaobai Sunny, Sumsum Gurushanth, Keerthi Mendonca, Pramila Mukhia, Nirza Patrick, Sanjana Gurudath, Shubha Raghavan, Subhashini Tsusennaro, Imchen Leivon, Shirley T. Kolur, Trupti Shetty, Vivek Bushan, Vidya Ramesh, Rohan Peterson, Tyler Pillai, Vijay Wilder-Smith, Petra Sigamani, Alben Suresh, Amritha Kuriakose, Moni Abraham Birur, Praveen Liang, Rongguang Classification of imbalanced oral cancer image data from high-risk population |
title | Classification of imbalanced oral cancer image data from high-risk population |
title_full | Classification of imbalanced oral cancer image data from high-risk population |
title_fullStr | Classification of imbalanced oral cancer image data from high-risk population |
title_full_unstemmed | Classification of imbalanced oral cancer image data from high-risk population |
title_short | Classification of imbalanced oral cancer image data from high-risk population |
title_sort | classification of imbalanced oral cancer image data from high-risk population |
topic | General |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8536945/ https://www.ncbi.nlm.nih.gov/pubmed/34689442 http://dx.doi.org/10.1117/1.JBO.26.10.105001 |
work_keys_str_mv | AT songbofan classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT lishaobai classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT sunnysumsum classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT gurushanthkeerthi classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT mendoncapramila classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT mukhianirza classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT patricksanjana classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT gurudathshubha classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT raghavansubhashini classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT tsusennaroimchen classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT leivonshirleyt classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT kolurtrupti classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT shettyvivek classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT bushanvidya classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT rameshrohan classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT petersontyler classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT pillaivijay classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT wildersmithpetra classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT sigamanialben classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT sureshamritha classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT kuriakosemoniabraham classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT birurpraveen classificationofimbalancedoralcancerimagedatafromhighriskpopulation AT liangrongguang classificationofimbalancedoralcancerimagedatafromhighriskpopulation |