Adversarial Approach to Cost-sensitive Classification and Sequence Tagging
thesisposted on 2019-08-01, 00:00 authored by Kaiser Newaj Asif
In many machine learning applications, predicting incorrect classes or labels incurs different penalties depending on the predicted class and the actual class. Cost-sensitive classification formulates this situation by seeking predictions that minimize this variable loss. Since directly optimizing the empirical cost-sensitive loss is generally intractable, existing cost-sensitive methods minimize surrogate loss functions. For example, the support vector machine (SVM) uses the hinge loss. However, the SVM can fail to learn the cost-minimizing prediction for even in ideal learning conditions (i.e., it does not provide a Fisher consistency guarantee). On the other hand, Logistic Regression, which uses log-loss as the surrogate, is difficult to adapt to the cost-sensitive setting. Although it allows class importance weights to be incorporated, it cannot be adapted to the cost-sensitive setting with more than two classes. We formulate the cost-sensitive classification as a minimax game between a predictor and a hypothetical adversary who approximates the training data labels, but is constrained by some training data properties. We directly include the cost-sensitive loss measure instead of a surrogate loss in the formulation. Unlike empirical risk minimization the resulting optimization problem is convex, allowing us to efficiently solve it. We develop and apply this method for multiclass cost-sensitive classification with arbitrary cost matrices. We then extend this work to sequence tagging (multiple interrelated variables) where Hamming loss is considered as the cost of mismatch between the target sequence and the predicted sequence. Later we improve the sequence tagging algorithm for faster computation and compare the two methods. We discuss a real-world application to welding quality detection and activity recognition. We demonstrate that this adversarial approach is competitive with traditional methods, while having the theoretical benefit of consistency.