Continual learning (CL) is a machine learning paradigm that focuses on enabling AI systems to learn from a sequence of tasks while retaining the acquired knowledge during the process to become capable of performing all the learned tasks. Although humans possess a remarkable ability to acquire new knowledge for a new task while remaining capable of performing the previously learned tasks, it remains extremely difficult for neural networks. A main challenge of CL is the issue of catastrophic forgetting (CF), which is the phenomenon where the AI system completely forgets the previous knowledge after learning a new task. Several methods have been proposed for the two main learning scenarios in CL. The first is task-incremental learning (TIL). This scenario assumes that the task information of a test instance is given at inference. On the other hand, class-incremental learning (CIL) assumes no task information at testing. In this thesis, we focus on CIL, which is especially challenging due to the additional problem of finding decision boundaries between the classes of different tasks without access to the previous task data during training and task-id during testing.
In this thesis, we first conduct a theoretical study on how to solve CIL. We show that the CIL problem can be decomposed into two problems. The first one is the within-task prediction (WP) problem. WP is in fact equivalent to TIL. The other is the task-id prediction (TP) problem. We further show that TP can be solved by out-of-distribution (OOD) detection, and similarly, OOD detection can be solved by TP. These findings establish necessary and sufficient conditions for CIL: a good CIL method produces good WP and TP performances, and a good WP and TP make a good CIL. Finally, based on the problem decomposition, we show that CIL is actually learnable. Several highly effective CIL techniques are designed to demonstrate the effectiveness of our theoretical analysis. The first two methods are called HAT+CSI and Sup+CSI. They combine the existing OOD detection technique CSI with the TIL methods HAT and SupSup. The proposed methods are based on ResNet-18 and outperform strong baselines in both TIL and CIL by large margins. Two additional methods are called Multi-head model for continual learning via OOD REplay (MORE) and Replay, OOD, and WP for CIL (ROW). The two methods are similar to each other as both use the replay samples in a memory buffer to train OOD detection models, but ROW is more principled and stronger than MORE as it is based on the decomposition. The two methods outperform strong baselines. They are also effective with very few replay samples and are naturally capable of detecting OOD instances while classifying the continually learned classes. We demonstrate our proposed methods via extensive experiments using the standard public benchmark datasets: MNIST, CIFAR, and Tiny-ImageNet.
History
Advisor
Bing Liu
Department
Computer Science
Degree Grantor
University of Illinois Chicago
Degree Level
Doctoral
Degree name
PhD, Doctor of Philosophy
Committee Member
Xinhua Zhang
Brian Ziebart
Sathya Ravi
Sahisnu Mazumder