University of Illinois Chicago
Browse

Designing Compact On-Disk Index Data Structures

thesis
posted on 2025-08-01, 00:00 authored by Wenshao Zhong
The performance of a database system heavily depends on its storage engine. The design of on-disk index data structure is critical to the storage engine performance. Many database systems have chosen B+-trees as the on-disk index data structure. A B+-tree organizes every internal node in a block of the I/O unit so that the reads would incur minimum I/Os. However, writes in B+-trees are slow because a single write often causes multiple blocks to be rewritten due to the need to maintain its structure. To enable the support for more efficient writes, many database systems start to adopt LSM-trees in their storage engines. While an LSM-tree achieves efficient writes by batching small updates to eliminate in-place rewrites, reads in LSM- trees often require multiple I/Os to read the data scattered across a number of files on the disk which makes its reads much slower than B+-trees’. For a long time, reads and writes have been perceived to be a trade-off in designing on- disk index data structures. It still remains a challenge to achieve efficient reads and writes at the same time. In this thesis, we explore the idea of using compact indexes on on-disk data structures that can achieve efficient reads and writes. In this thesis, we introduce three works that progressively achieve more efficient index data structure designs. The first work REMIX compactly records and replay the scanning paths to achieve more efficient scanning in LSM- trees. The second work Disco, built on top of REMIX, achieves more efficient use of I/O for point and range queries. Disco guarantees that a point query issues at most one I/O while a short range query needs at most two I/Os, achieving an I/O efficiency close to a B+-tree. The third work extends the design of Disco to achieve an optimal use of I/O for on-disk data structures.

History

Language

  • en

Advisor

Jakob Eriksson

Department

Computer Science

Degree Grantor

University of Illinois Chicago

Degree Level

  • Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

Xingbo Wu Chris Kanich Abolfazl Asudeh Stavros Sintos

Thesis type

application/pdf

Usage metrics

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC