University of Illinois at Chicago
Browse
CORAIN-THESIS-2020.pdf (17.28 MB)

A Density-Based Method for Scalable Outlier Detection in Large Datasets

Download (17.28 MB)
thesis
posted on 2020-08-01, 00:00 authored by Matteo Corain
DBSCAN is one of the most well-known algorithm in the field of density-based clustering, although its applicability to large datasets is generally disputed due to its high complexity. The aim of this work is to propose a new, parallel, Spark-based procedure for the sole purpose of anomaly detection, in a way which is coherent to the DBSCAN definition and suitable for the big data context. From a theoretical side, this algorithm is characterized by a worst-case performance boundary that depends linearly on the size of the dataset; in practical tests, it outperforms available solutions both in terms of result quality and overall scalability when the data grow large.

History

Advisor

Asudeh, Abolfazl

Chair

Asudeh, Abolfazl

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Masters

Degree name

MS, Master of Science

Committee Member

Cruz, Isabel Garza, Paolo

Submitted date

August 2020

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC