Network Based Sampling for Time Series Classification
thesisposted on 2021-08-01, 00:00 authored by Samuel T Harford
A time series is an ordered collection of data points collected by observers or sensing devices. Time series classification aims to label time series instances based on previously seen examples. The two primary goals of time series classification are to obtain highly accurate results while reducing the time of calculations. 1-Nearest-Neighbor Dynamic Time Warping (1-NN DTW) is the most widely used classification method on time series and serves as a benchmark when compared to emerging techniques. Several studies have shown that for the task of time series classification, 1-NN DTW is hard to beat with respects to both goals. Although 1-NN DTW achieves accurate results, it comes with a high cost of processing. With the increased need for machine learning based algorithms to run on edge devices (low capacity), there is a need to reduce the processing requirements of classification algorithms. The focus of this dissertation is to reduce the processing time of time series classification algorithms. This is achieved through a method called Network Based Sampling (NBS), which limits the number of training examples used to classify each instance. Network Based Sampling is a preprocessing attachment that can be used in conjugation with any classification algorithm. NBS works by analyzing the training data of a time series classification problem to rank each training instance. The ranking process first generates a network of all training instances based on their class labels and Euclidean distances to each other. Then the most prominent instances are determined through an analysis of the network connections. From this network, the training set is sampled from based on a percentage specified by the user that corresponds to the average time savings. The effectiveness of NBS is tested as an attachment on 1-NN DTW time series classification. The 85 time series benchmarks from the University of California, Riverside (UCR) archive is used as a source of data for classification evaluation.