CORAIN-THESIS-2020.pdf (17.28 MB)
A Density-Based Method for Scalable Outlier Detection in Large Datasets
thesisposted on 2020-08-01, 00:00 authored by Matteo Corain
DBSCAN is one of the most well-known algorithm in the field of density-based clustering, although its applicability to large datasets is generally disputed due to its high complexity. The aim of this work is to propose a new, parallel, Spark-based procedure for the sole purpose of anomaly detection, in a way which is coherent to the DBSCAN definition and suitable for the big data context. From a theoretical side, this algorithm is characterized by a worst-case performance boundary that depends linearly on the size of the dataset; in practical tests, it outperforms available solutions both in terms of result quality and overall scalability when the data grow large.
Degree GrantorUniversity of Illinois at Chicago
Degree nameMS, Master of Science
Committee MemberCruz, Isabel Garza, Paolo
Submitted dateAugust 2020