University of Illinois Chicago
Browse

Automata Minimization and Beyond: A Systematic Evaluation of DFA-based Pattern Matching

thesis
posted on 2025-05-01, 00:00 authored by Filippo Giovanni Del Nero
Pattern-matching based on Regular Expressions (REs) is crucial in several applications, including network intrusion detection, Deep Packet Inspection (DPI), genome analysis, spam filters, database queries, and natural language preprocessing. Finite State Automata (FSAs) are a representation with equivalent power to REs commonly employed to analyze patterns against data efficiently. FSAs can be divided into two macro categories: Deterministic Finite State Automata (DFAs) and Non-Deterministic Finite State Automata (NFAs). The former ensures bounded execution times at the cost of state-explosion issues – as they have a larger memory footprint – while the latter significantly reduces memory usage but demands higher throughput capability in the processing engine. In this context, numerous works aim at compressing DFAs to reduce their memory footprint while exploiting their attractive computation characteristics. These compressed DFAs often require modifying their execution algorithm to perform pattern matching on a data stream with new automata representation. However, most of these techniques focus on state and transition compression, potentially lacking a thorough evaluation of the impact the compression has when running the modified execution algorithms. This thesis proposes a new framework to systematically analyze two classes of highly effective DFA-compression algorithms stressing their impact on pattern-matching execution. Firstly, this thesis characterizes the most impacting DFA-based compression techniques regarding state compression, transition compression, and compression time. Then, it thoroughly assesses the memory footprint and execution time of pattern-matching applications exploiting these compression algorithms across modern benchmarks and architectures, thus enriching the analysis proposed in previous works. With this in-depth evaluation, this thesis clarifies the tradeoff between memory compression and the execution impact of DFA-based compression algorithms in a modern context, supporting the design of ad-hoc pattern matching engines.

History

Advisor

Wenjing Rao

Department

Computer Science

Degree Grantor

University of Illinois Chicago

Degree Level

  • Masters

Degree name

MS, Master of Science

Committee Member

Piotr Gmytrasiewicz Zhiling Lan Marco Santambrogio

Thesis type

application/pdf

Language

  • en

Usage metrics

    Dissertations and Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC