A Density-Based Method for Scalable Outlier Detection in Large Datasets

Corain, Matteo

doi:10.25417/uic.13475433.v1

A Density-Based Method for Scalable Outlier Detection in Large Datasets

thesis

posted on 2020-08-01, 00:00 authored by Matteo Corain

DBSCAN is one of the most well-known algorithm in the field of density-based clustering, although its applicability to large datasets is generally disputed due to its high complexity. The aim of this work is to propose a new, parallel, Spark-based procedure for the sole purpose of anomaly detection, in a way which is coherent to the DBSCAN definition and suitable for the big data context. From a theoretical side, this algorithm is characterized by a worst-case performance boundary that depends linearly on the size of the dataset; in practical tests, it outperforms available solutions both in terms of result quality and overall scalability when the data grow large.

History

Advisor

Asudeh, Abolfazl

Chair

Asudeh, Abolfazl

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Masters

Degree name

MS, Master of Science

Committee Member

Cruz, Isabel Garza, Paolo

Submitted date

August 2020

Thesis type

application/pdf

Language

en

Usage metrics

Keywords

big data data science outlier detection DBSCAN

Licence

In Copyright

A Density-Based Method for Scalable Outlier Detection in Large Datasets

History

Advisor

Chair

Department

Degree Grantor

Degree Level

Degree name

Committee Member

Submitted date

Thesis type

Language

Usage metrics

Categories

Keywords

Licence

Exports