CORAIN-THESIS-2020.pdf (17.28 MB)

A Density-Based Method for Scalable Outlier Detection in Large Datasets

Download (17.28 MB)
thesis
posted on 01.08.2020, 00:00 by Matteo Corain
DBSCAN is one of the most well-known algorithm in the field of density-based clustering, although its applicability to large datasets is generally disputed due to its high complexity. The aim of this work is to propose a new, parallel, Spark-based procedure for the sole purpose of anomaly detection, in a way which is coherent to the DBSCAN definition and suitable for the big data context. From a theoretical side, this algorithm is characterized by a worst-case performance boundary that depends linearly on the size of the dataset; in practical tests, it outperforms available solutions both in terms of result quality and overall scalability when the data grow large.

History

Advisor

Asudeh, Abolfazl

Chair

Asudeh, Abolfazl

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Masters

Degree name

MS, Master of Science

Committee Member

Cruz, Isabel Garza, Paolo

Submitted date

August 2020

Thesis type

application/pdf

Language

en

Exports

Categories

Exports