Cargando…

Multi-center verification of the influence of data ratio of training sets on test results of an AI system for detecting early gastric cancer based on the YOLO-v4 algorithm

OBJECTIVE: Convolutional Neural Network(CNN) is increasingly being applied in the diagnosis of gastric cancer. However, the impact of proportion of internal data in the training set on test results has not been sufficiently studied. Here, we constructed an artificial intelligence (AI) system called...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jin, Tao, Jiang, Yancai, Mao, Boneng, Wang, Xing, Lu, Bo, Qian, Ji, Zhou, Hutao, Ma, Tieliang, Zhang, Yefei, Li, Sisi, Shi, Yun, Yao, Zhendong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Oncology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9425091/ https://www.ncbi.nlm.nih.gov/pubmed/36052264 http://dx.doi.org/10.3389/fonc.2022.953090

_version_	1784778372258201600
author	Jin, Tao Jiang, Yancai Mao, Boneng Wang, Xing Lu, Bo Qian, Ji Zhou, Hutao Ma, Tieliang Zhang, Yefei Li, Sisi Shi, Yun Yao, Zhendong
author_facet	Jin, Tao Jiang, Yancai Mao, Boneng Wang, Xing Lu, Bo Qian, Ji Zhou, Hutao Ma, Tieliang Zhang, Yefei Li, Sisi Shi, Yun Yao, Zhendong
author_sort	Jin, Tao
collection	PubMed
description	OBJECTIVE: Convolutional Neural Network(CNN) is increasingly being applied in the diagnosis of gastric cancer. However, the impact of proportion of internal data in the training set on test results has not been sufficiently studied. Here, we constructed an artificial intelligence (AI) system called EGC-YOLOV4 using the YOLO-v4 algorithm to explore the optimal ratio of training set with the power to diagnose early gastric cancer. DESIGN: A total of 22,0918 gastroscopic images from Yixing People’s Hospital were collected. 7 training set models were established to identify 4 test sets. Respective sensitivity, specificity, Youden index, accuracy, and corresponding thresholds were tested, and ROC curves were plotted. RESULTS: 1. The EGC-YOLOV4 system completes all tests at an average reading speed of about 15 ms/sheet; 2. The AUC values in training set 1 model were 0.8325, 0.8307, 0.8706, and 0.8279, in training set 2 model were 0.8674, 0.8635, 0.9056, and 0.9249, in training set 3 model were 0.8544, 0.8881, 0.9072, and 0.9237, in training set 4 model were 0.8271, 0.9020, 0.9102, and 0.9316, in training set 5 model were 0.8249, 0.8484, 0.8796, and 0.8931, in training set 6 model were 0.8235, 0.8539, 0.9002, and 0.9051, in training set 7 model were 0.7581, 0.8082, 0.8803, and 0.8763. CONCLUSION: EGC-YOLOV4 can quickly and accurately identify the early gastric cancer lesions in gastroscopic images, and has good generalization.The proportion of positive and negative samples in the training set will affect the overall diagnostic performance of AI.In this study, the optimal ratio of positive samples to negative samples in the training set is 1:1~ 1:2.
format	Online Article Text
id	pubmed-9425091
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-94250912022-08-31 Multi-center verification of the influence of data ratio of training sets on test results of an AI system for detecting early gastric cancer based on the YOLO-v4 algorithm Jin, Tao Jiang, Yancai Mao, Boneng Wang, Xing Lu, Bo Qian, Ji Zhou, Hutao Ma, Tieliang Zhang, Yefei Li, Sisi Shi, Yun Yao, Zhendong Front Oncol Oncology OBJECTIVE: Convolutional Neural Network(CNN) is increasingly being applied in the diagnosis of gastric cancer. However, the impact of proportion of internal data in the training set on test results has not been sufficiently studied. Here, we constructed an artificial intelligence (AI) system called EGC-YOLOV4 using the YOLO-v4 algorithm to explore the optimal ratio of training set with the power to diagnose early gastric cancer. DESIGN: A total of 22,0918 gastroscopic images from Yixing People’s Hospital were collected. 7 training set models were established to identify 4 test sets. Respective sensitivity, specificity, Youden index, accuracy, and corresponding thresholds were tested, and ROC curves were plotted. RESULTS: 1. The EGC-YOLOV4 system completes all tests at an average reading speed of about 15 ms/sheet; 2. The AUC values in training set 1 model were 0.8325, 0.8307, 0.8706, and 0.8279, in training set 2 model were 0.8674, 0.8635, 0.9056, and 0.9249, in training set 3 model were 0.8544, 0.8881, 0.9072, and 0.9237, in training set 4 model were 0.8271, 0.9020, 0.9102, and 0.9316, in training set 5 model were 0.8249, 0.8484, 0.8796, and 0.8931, in training set 6 model were 0.8235, 0.8539, 0.9002, and 0.9051, in training set 7 model were 0.7581, 0.8082, 0.8803, and 0.8763. CONCLUSION: EGC-YOLOV4 can quickly and accurately identify the early gastric cancer lesions in gastroscopic images, and has good generalization.The proportion of positive and negative samples in the training set will affect the overall diagnostic performance of AI.In this study, the optimal ratio of positive samples to negative samples in the training set is 1:1~ 1:2. Frontiers Media S.A. 2022-08-16 /pmc/articles/PMC9425091/ /pubmed/36052264 http://dx.doi.org/10.3389/fonc.2022.953090 Text en Copyright © 2022 Jin, Jiang, Mao, Wang, Lu, Qian, Zhou, Ma, Zhang, Li, Shi and Yao https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Oncology Jin, Tao Jiang, Yancai Mao, Boneng Wang, Xing Lu, Bo Qian, Ji Zhou, Hutao Ma, Tieliang Zhang, Yefei Li, Sisi Shi, Yun Yao, Zhendong Multi-center verification of the influence of data ratio of training sets on test results of an AI system for detecting early gastric cancer based on the YOLO-v4 algorithm
title	Multi-center verification of the influence of data ratio of training sets on test results of an AI system for detecting early gastric cancer based on the YOLO-v4 algorithm
title_full	Multi-center verification of the influence of data ratio of training sets on test results of an AI system for detecting early gastric cancer based on the YOLO-v4 algorithm
title_fullStr	Multi-center verification of the influence of data ratio of training sets on test results of an AI system for detecting early gastric cancer based on the YOLO-v4 algorithm
title_full_unstemmed	Multi-center verification of the influence of data ratio of training sets on test results of an AI system for detecting early gastric cancer based on the YOLO-v4 algorithm
title_short	Multi-center verification of the influence of data ratio of training sets on test results of an AI system for detecting early gastric cancer based on the YOLO-v4 algorithm
title_sort	multi-center verification of the influence of data ratio of training sets on test results of an ai system for detecting early gastric cancer based on the yolo-v4 algorithm
topic	Oncology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9425091/ https://www.ncbi.nlm.nih.gov/pubmed/36052264 http://dx.doi.org/10.3389/fonc.2022.953090
work_keys_str_mv	AT jintao multicenterverificationoftheinfluenceofdataratiooftrainingsetsontestresultsofanaisystemfordetectingearlygastriccancerbasedontheyolov4algorithm AT jiangyancai multicenterverificationoftheinfluenceofdataratiooftrainingsetsontestresultsofanaisystemfordetectingearlygastriccancerbasedontheyolov4algorithm AT maoboneng multicenterverificationoftheinfluenceofdataratiooftrainingsetsontestresultsofanaisystemfordetectingearlygastriccancerbasedontheyolov4algorithm AT wangxing multicenterverificationoftheinfluenceofdataratiooftrainingsetsontestresultsofanaisystemfordetectingearlygastriccancerbasedontheyolov4algorithm AT lubo multicenterverificationoftheinfluenceofdataratiooftrainingsetsontestresultsofanaisystemfordetectingearlygastriccancerbasedontheyolov4algorithm AT qianji multicenterverificationoftheinfluenceofdataratiooftrainingsetsontestresultsofanaisystemfordetectingearlygastriccancerbasedontheyolov4algorithm AT zhouhutao multicenterverificationoftheinfluenceofdataratiooftrainingsetsontestresultsofanaisystemfordetectingearlygastriccancerbasedontheyolov4algorithm AT matieliang multicenterverificationoftheinfluenceofdataratiooftrainingsetsontestresultsofanaisystemfordetectingearlygastriccancerbasedontheyolov4algorithm AT zhangyefei multicenterverificationoftheinfluenceofdataratiooftrainingsetsontestresultsofanaisystemfordetectingearlygastriccancerbasedontheyolov4algorithm AT lisisi multicenterverificationoftheinfluenceofdataratiooftrainingsetsontestresultsofanaisystemfordetectingearlygastriccancerbasedontheyolov4algorithm AT shiyun multicenterverificationoftheinfluenceofdataratiooftrainingsetsontestresultsofanaisystemfordetectingearlygastriccancerbasedontheyolov4algorithm AT yaozhendong multicenterverificationoftheinfluenceofdataratiooftrainingsetsontestresultsofanaisystemfordetectingearlygastriccancerbasedontheyolov4algorithm

Multi-center verification of the influence of data ratio of training sets on test results of an AI system for detecting early gastric cancer based on the YOLO-v4 algorithm

Ejemplares similares