Çмú´ëȸ ¹ßÇ¥ ¿¬Á¦ ÃÊ·Ï

¹ßÇ¥Çü½Ä : Á¢¼ö¹øÈ£ - 990047    HNOP 2-2 
DEVELOPMENT OF A SWIN TRANSFORMER-BASED ARTIFICIAL INTELLIGENCE MODEL FOR AUTOMATIC SELECTION OF HIGH-QUALITY LARYNGEAL ENDOSCOPIC IMAGES
DEPARTMENT OF OTORHINOLARYNGOLOGY-HEAD AND NECK SURGERY, KOREA UNIVERSITY ANSAN HOSPITAL, KOREA UNIVERSITY COLLEGE OF MEDICINE1, DEPARTMENT OF BIOMEDICAL INFORMATICS, KOREA UNIVERSITY COLLEGE OF MEDICINE2
JEONG HWAN KIM, JEONG HWAN KIM1,2, YEONGMIN KIM2, MIN YOUNG SEO1, HWAMIN LEE2, SOON YOUNG KWON1
¸ñÀû: Laryngoscopy imaging is essential for patients presenting with vocal cord lesions. However, various factors such as reduced patient compliance due to coughing and fogging on the endoscope surface often result in low-quality endoscopic images where the lesions cannot be identified. These low-quality images do not aid in subsequent patient history evaluations and unnecessarily consume data storage. Additionally, the time required to identify high-quality endoscopic images for inclusion in medical records leads to delays in medical consultations. This study aims to develop a model that automatically classifies high-quality laryngeal endoscopic images using the latest AI technique, the Swin Transformer. ¹æ¹ý:This study was conducted based on laryngoscopy images captured from patients at the Department of Otolaryngology, Korea University Ansan Hospital, between 2010 and 2023, who were confirmed to have diagnosis codes for vocal cord diseases. These images include not only laryngeal images but also images of various lesions captured during the examination. Each image was reviewed by an otolaryngology specialist and classified into three quality levels from the perspective of diagnosing vocal cord lesions. Pre-trained models such as ResNet50, Vision Transformer, and Swin Transformer (versions 1 & 2) were applied to this dataset via transfer learning to categorize the images into three quality levels. Additionally, a binary classification model was developed to isolate only high-quality images. The final models were evaluated using the Explainable AI (XAI) technique, GradCAM, to verify whether the models made appropriate judgments. °á°ú:A total of 4,711 laryngeal endoscopic images were collected, of which 3,099 (65.8%) were low-quality, 698 (14.8%) were medium-quality, and 914 (19.4%) were high-quality images. Among the applied models, the Swin Transformer (version 1)-based model that implemented data augmentation, exhibited the best performance, achieving an accuracy of 0.8602, precision of 0.8582, recall of 0.8602, F1 score of 0.8587, AUROC of 0.9558, and AUPRC of 0.9023. For the high-quality image binary classification, the best results were obtained when images were first categorized into three quality levels and then the classified low and medium quality images were merged into a single category for binary classification. This approach achieved an accuracy of 0.9513, precision of 0.8485, recall of 0.9130, F1 score of 0.8796, AUROC of 0.9791, and AUPRC of 0.9270. Applying Grad-CAM to the model confirmed that the AI model determines quality by focusing on vocal cord lesions and performs classification tasks in the same manner as actual clinical physicians, thereby demonstrating that the model operates with medical validity. In the (version 2)–based model, although the performance was comparable, the GradCAM results revealed a mechanism of operation that was not medically valid. °á·Ð:In this study, a high-performance model was developed to classify high-quality laryngeal endoscopic images in a medically explainable manner. Future integration of this model into PACS and EMR systems to automate the insertion of high-quality laryngoscopy images into medical records could reduce consultation delays and achieve more efficient data management.


[´Ý±â]