University of Illinois Chicago
Browse

A Model-Distributed Inference Approach for Large Language Models at the Edge

Download (4.18 MB)
thesis
posted on 2024-08-01, 00:00 authored by Davide Macario
We present an implementation of Model-Distributed Inference for Large-Language Models ("MDI-LLM"), a framework designed to deploy state-of-the-art LLMs across a network of low-power edge devices. This is achieved by partitioning the model layers into chunks, each assigned to different nodes that exchange intermediate network activations wirelessly over the air. To optimize this process, we introduce "recurrent pipeline parallelism," a technique that minimizes idle time for each device and enables parallel inference when generating multiple pieces of text. By collectively utilizing the computational resources of multiple edge devices, MDI-LLM allows the deployment of models too large to fit on a single edge device, allowing inference on cheap hardware. Also, increasing the number of cooperating devices allows MDI-LLM to increase token generation rates while reducing per-device memory consumption.

History

Advisor

Erdem Koyuncu

Department

Electrical and Computer Engineering

Degree Grantor

University of Illinois Chicago

Degree Level

  • Masters

Degree name

Master of Science

Committee Member

Hulya Seferoglu Michela Meo

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC