DEPARTMENTS OF CONVERGENCE MEDICINE, UNIVERSITY OF ULSAN COLLEGE OF MEDICINE, ASAN MEDICAL CENTER, SEOUL, KOREA©ö, DEPARTMENT OF OTORHINOLARYNGOLOGY-HEAD AND NECK SURGERY, UNIVERSITY OF ULSAN COLLEGE OF MEDICINE, ASAN MEDICAL CENTER, SEOUL, KOREA©÷ |
¸ñÀû: The preservation of anonymity in medical data is of paramount
importance. The main objective of this study is to explore the
feasibility and effectiveness of using synthetic images as a substitute
for real medical datasets. ¹æ¹ý:Otoendoscopic images of the tympanic membrane (TM) were obtained from
patients visiting the otologic clinic at Asan Medical Center between
January 2018 and December 2023. The images, selected based on the date
of the clinic visit, were fully anonymized prior to their retrieval
from the hospital system. These images were then categorized into
chronic otitis media (COM) and other conditions. Out of the total
9,053 images collected, 2,257 were diagnosed as COM.
The images were divided into train, validation, and test sets with a
ratio of 3:1:1. Images were reformatted to 256¡¿256¡¿3 dimensions and
subjected to circular cropping.
For otitis media binary classification, the EfficientNet-B4 model was
used. We employed data augmentation techniques that randomly
incorporated rotation, translation shifts, zoom adjustments,
horizontal flipping, brightness modifications, and downscaling.
Transfer learning was applied using pre-trained weights from ImageNet.
Categorical cross-entropy was used as the loss function.
In our generative model, we utilized Stable Diffusion 2, and for
ensuring its validity, exclusively used the classification train set
during the fine-tuning. Our training process was optimized by
employing 8-bit Adam optimization, coupled with xformers, to
facilitate memory-efficient attention mechanisms. This combination of
strategies was instrumental in enhancing the overall efficiency and
effectiveness of our model's training phase.
To assess the impact on classification performance, we systematically
substituted the COM images in the training set with synthetically
generated images at increments of 0%, 20%, 40%, 60%, 80%, and 100%.
°á°ú:In the evaluation of the F1 score for classification, the results varied
with different proportions of synthetic data in the training set.
Initially, with 0% synthetic data, the F1 score was at its highest,
recorded at 95.01%. As the percentage of synthetic data increased to
20%, there was a slight decrease in the F1 score to 94.49%. Continuing
this trend, the F1 score further dropped to 93.79% with 40% synthetic
data, and then to 93.63% at 60% synthetic data. At 80% synthetic data,
the F1 score was observed at 93.18%. The performance remained relatively
stable in the F1 score was observed. However, when the train set
consisted entirely of synthetic data, the F1 score recorded was 39.86%. °á·Ð:This study demonstrates the viability of using synthetic images
generated via Stable Diffusion 2 for training deep learning models in
the field of otitis media classification. The integration of synthetic
images into the train set was found to have minimal impact on model
performance. Our findings underscore the potential of generative
models in the domain of medical imaging, indicating their capability
to effectively augment datasets. |