Çмú´ëȸ ¹ßÇ¥ ¿¬Á¦ ÃÊ·Ï

¹ßÇ¥Çü½Ä : Æ÷½ºÅÍ(Àü½Ã) Á¢¼ö¹øÈ£ - 980319    OTPX-09 
DEEP-LEARNING CLASSIFICATION OF OTITIS MEDIA WITH SYNTHETIC IMAGES USING STABLE DIFFUSION
DEPARTMENTS OF CONVERGENCE MEDICINE, UNIVERSITY OF ULSAN COLLEGE OF MEDICINE, ASAN MEDICAL CENTER, SEOUL, KOREA©ö, DEPARTMENT OF OTORHINOLARYNGOLOGY-HEAD AND NECK SURGERY, UNIVERSITY OF ULSAN COLLEGE OF MEDICINE, ASAN MEDICAL CENTER, SEOUL, KOREA©÷
JOONG HO AHN, KEUNWOO PARK©ö, JIHYE CHAE©ö, TAEGYEONG KIM©÷, JIHOON KWEON©ö, JOONG HO AHN©÷
¸ñÀû: The preservation of anonymity in medical data is of paramount importance. The main objective of this study is to explore the feasibility and effectiveness of using synthetic images as a substitute for real medical datasets. ¹æ¹ý:Otoendoscopic images of the tympanic membrane (TM) were obtained from patients visiting the otologic clinic at Asan Medical Center between January 2018 and December 2023. The images, selected based on the date of the clinic visit, were fully anonymized prior to their retrieval from the hospital system. These images were then categorized into chronic otitis media (COM) and other conditions. Out of the total 9,053 images collected, 2,257 were diagnosed as COM. The images were divided into train, validation, and test sets with a ratio of 3:1:1. Images were reformatted to 256¡¿256¡¿3 dimensions and subjected to circular cropping. For otitis media binary classification, the EfficientNet-B4 model was used. We employed data augmentation techniques that randomly incorporated rotation, translation shifts, zoom adjustments, horizontal flipping, brightness modifications, and downscaling. Transfer learning was applied using pre-trained weights from ImageNet. Categorical cross-entropy was used as the loss function. In our generative model, we utilized Stable Diffusion 2, and for ensuring its validity, exclusively used the classification train set during the fine-tuning. Our training process was optimized by employing 8-bit Adam optimization, coupled with xformers, to facilitate memory-efficient attention mechanisms. This combination of strategies was instrumental in enhancing the overall efficiency and effectiveness of our model's training phase. To assess the impact on classification performance, we systematically substituted the COM images in the training set with synthetically generated images at increments of 0%, 20%, 40%, 60%, 80%, and 100%. °á°ú:In the evaluation of the F1 score for classification, the results varied with different proportions of synthetic data in the training set. Initially, with 0% synthetic data, the F1 score was at its highest, recorded at 95.01%. As the percentage of synthetic data increased to 20%, there was a slight decrease in the F1 score to 94.49%. Continuing this trend, the F1 score further dropped to 93.79% with 40% synthetic data, and then to 93.63% at 60% synthetic data. At 80% synthetic data, the F1 score was observed at 93.18%. The performance remained relatively stable in the F1 score was observed. However, when the train set consisted entirely of synthetic data, the F1 score recorded was 39.86%. °á·Ð:This study demonstrates the viability of using synthetic images generated via Stable Diffusion 2 for training deep learning models in the field of otitis media classification. The integration of synthetic images into the train set was found to have minimal impact on model performance. Our findings underscore the potential of generative models in the domain of medical imaging, indicating their capability to effectively augment datasets.


[´Ý±â]