Publication: Data-centric single teacher guided knowledge distillation for alleviating sub-optimal supervision in image classification
Type:
Article
Date
2026-02-23
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Elsevier Ltd
Abstract
In recent years, larger, deeper, and more complex deep learning models have emerged as a result of advancements in deep learning techniques. Nevertheless, the computational costs have also increased with the growing model size. Thus, Knowledge Distillation has evolved into a cornerstone in contemporary machine learning, facilitating the transfer of knowledge from cumbersome teacher models to more compact student models. However, student learning is persistently challenged by sub-optimal supervision caused by erroneous and ambiguous teacher predictions. Moreover, the learning process is further deteriorated by the complications introduced through frequently encountered noisy labels in real-world datasets. Existing methods often resort to the ensemble of teachers, introducing additional complexity. We propose a novel, simple, and efficient learning method, Corrective Knowledge Distillation (CKD), to alleviate these drawbacks while relying solely on a single-teacher model. The proposed work employs a two-phase learning paradigm. In the initial phase, the teacher selectively teaches extremely confident knowledge to the student, and in the subsequent phase, the student leverages its own past learning experiences, conditioning its knowledge acquisition on the guidance of the teacher. The proposed method consistently exhibits superior performance in addressing sub-optimal supervision, as evidenced by comprehensive experiments on benchmark datasets such as CIFAR-100, CIFAR-100N-Fine, and ImageNet-1K. Notably, CKD surpasses established baselines, achieving substantial accuracy gains of up to 3.53% in real-world scenarios. Furthermore, CKD exhibits exceptional robustness in highly noisy environments, outperforming ensemble techniques by a significant margin of up to 5.18%. Our code is available at https://github.com/Karthick47v2/ckd.
Description
Keywords
Confident learning, Knowledge distillation, Model compression
