University of Illinois Chicago
Browse

TIGER: Testing and Improving Generated Code with LLMs

thesis
posted on 2025-08-01, 00:00 authored by Lorenzo Gallone
This thesis presents a test-driven framework for enhancing the reliability of code generated by Large Language Models (LLMs), focusing on real-world applicability and minimal developer assistance. The system is designed to simulate a realistic development environment where no ground-truth implementations are available to the model, relying exclusively on textual artifacts such as documentation, docstrings, and test outcomes. This constraint ensures that every generated function is derived from semantic understanding rather than replication or pattern-matching. A core innovation of this work is the integration of an iterative refinement loop, which introduces structured feedback into the code generation process. After producing an initial function from a natural language prompt, the model’s output is immediately tested. If failures occur, relevant error signals are extracted and used to update the prompt, allowing the model to revise its solution. This loop continues until the implementation passes all associated tests or a retry limit is reached. The system thus mirrors a human-like workflow of test-driven development and debugging. To assess the contribution of this iterative process, the same framework is also evaluated in a non-iterative configuration, where each function is generated only once based on its prompt and tested without revision. The evaluation is conducted on entire Python repositories—not isolated functions—making the task significantly more complex. Functions are embedded in larger software structures, ix SUMMARY (continued) depend on shared state or class behavior, and are often indirectly tested through multi-layered scenarios. The system parses these repositories to extract structural metadata, resolve function- to-test mappings, and build context-aware prompts that support both initial generation and iterative correction. The results demonstrate that embedding LLMs into a feedback-rich environment substan- tially increases their capacity to produce robust, test-passing code. Despite added computa- tional cost, the iterative approach leads to higher success rates across a diverse range of code- bases, showing that language models, when guided by empirical signals and properly contextu- alized, can evolve from static generators into adaptive agents capable of producing functionally correct and maintainable code.

History

Language

  • en

Advisor

Venkatesan Natarajan VenkatakrishnanVenkatesan Natarajan Venkatakrishnan

Department

Computer Science

Degree Grantor

University of Illinois Chicago

Degree Level

  • Masters

Degree name

MS, Master of Science

Committee Member

Rigel Gjomemo Chris Kanich Stefano Scanzio

Thesis type

application/pdf

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC