CORAIN-THESIS-2020.pdf (17.28 MB)
A Density-Based Method for Scalable Outlier Detection in Large Datasets
thesis
posted on 2020-08-01, 00:00 authored by Matteo CorainDBSCAN is one of the most well-known algorithm in the field of density-based clustering, although its applicability to large datasets is generally disputed due to its high complexity. The aim of this work is to propose a new, parallel, Spark-based procedure for the sole purpose of anomaly detection, in a way which is coherent to the DBSCAN definition and suitable for the big data context. From a theoretical side, this algorithm is characterized by a worst-case performance boundary that depends linearly on the size of the dataset; in practical tests, it outperforms available solutions both in terms of result quality and overall scalability when the data grow large.
History
Advisor
Asudeh, AbolfazlChair
Asudeh, AbolfazlDepartment
Computer ScienceDegree Grantor
University of Illinois at ChicagoDegree Level
- Masters
Degree name
MS, Master of ScienceCommittee Member
Cruz, Isabel Garza, PaoloSubmitted date
August 2020Thesis type
application/pdfLanguage
- en