University of Illinois Chicago
Browse

Geometric Approximation Algorithms for Metric Learning and Poisoning

thesis
posted on 2023-05-01, 00:00 authored by Diego Hisashi Ihara Centurion
Machine learning tasks such as nearest neighbor and clustering depend on a good representation of the input data. A good representation is one where distance quantifies dissimilarity between pairs of data points. Metric learning aims to find such good representation by mapping input data to points in some metric space. However, the input data can be contaminated by the presence of adversarial data. For this reason, a desirable property of machine learning algorithms is robustness in the presence of such adversarial data. Several algorithms that are robust to small perturbations in the input data have been proposed. However, it is generally not clear how to provably compare the robustness of these algorithms. In this dissertation, we present approximation algorithms for learning Mahalanobis metric spaces even when a small fraction of the data set is adversarial, with theoretical guaranties. To that end, we consider a formulation of Mahalanobis metric learning as an optimization problem, where the objective is to minimize the number of violated similarity/dissimilarity constraints. We show that for any fixed ambient dimension, there exists a fully polynomial-time approximation scheme (FPTAS) with nearly-linear running time. Next, we present efficient algorithms for poisoning data sets against k-center and k-means clustering. Formally, given a set of points, $X$, our algorithms compute a small set of poison points, $P$, such that the performance of the chosen clustering method degrades as much as possible on $X\cup P$, compared to the original data set, $X$. We obtain efficient poisoning approximation algorithms that achieve $\frac{1}{6}$-approximation for $k$-center clustering and $(\Omega(1/k^2),2)$-approximation for $k$-means clustering compared to optimal poisoning.We corroborate our theoretical results with experimental evidence that our methods can be used to evaluate the robustness of clustering algorithms. Finally, we present an algorithm for poisoning datasets against k-NN classification by flipping a small portion of the labels through the application of multi-scale random partitions.

History

Advisor

Sidiropoulos, Anastasios

Chair

Sidiropoulos, Anastasios

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

DasGupta, Bhaskar Sun, Xiaorui Asudeh, Abolfazl Papakonstantinou, Periklis

Submitted date

May 2023

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC