University of Illinois Chicago
Browse

AI-Bind 2.0: Leveraging Pre-trained Language Model for Protein-Ligand Binding Prediction

thesis
posted on 2025-05-01, 00:00 authored by Matteo Negro
In recent years, machine learning and deep learning have significantly reshaped the landscape of computational biology and drug discovery, enabling data-driven insights into complex biological interactions. This thesis presents AI-Bind 2.0, an innovative computational pipeline designed to enhance protein-ligand binding predictions with higher accuracy and robustness compared to previous models. The core advancement of AI-Bind 2.0 lies in its integration of pre-trained protein language models, specifically the ProtTrans model, a transformer-based architecture trained on vast biological datasets. By leveraging ProtTrans, AI-Bind 2.0 can capture intricate dependencies within protein sequences, yielding more precise, contextually rich representations of protein-ligand interactions than those achieved with earlier embedding techniques, such as ProtVec. To validate its effectiveness, AI-Bind 2.0 was trained and tested on a rigorously curated dataset drawn from sources like DrugBank, BindingDB, and Drug Target Commons, encompassing a balanced mix of binding and non-binding interactions. The evaluation was conducted using both transductive and inductive testing approaches to measure the model’s generalization capabilities. Performance metrics, including the Area Under the Receiver Operating Characteristic Curve and the Area Under the Precision-Recall Curve, demonstrate AI-Bind 2.0’s notable improvements over its predecessor, particularly in handling interactions involving novel proteins and ligands. Additionally, AI-Bind 2.0’s application in COVID-19 research highlights its potential to expedite drug discovery by accurately predicting binding interactions with key viral proteins, thus aiding in the identification of promising therapeutic candidates. The contributions of this research underscore the potential of advanced natural language processing techniques in bioinformatics, demonstrating that transformer-based models can effectively capture complex biological patterns critical to drug discovery. By establishing a more accurate and generalizable framework for predicting protein-ligand interactions, AI-Bind 2.0 offers a valuable tool for computational biology and opens new avenues for research in understanding and targeting molecular mechanisms.

History

Advisor

Piotr Gmytrasiewicz

Department

Computer Science

Degree Grantor

University of Illinois Chicago

Degree Level

  • Masters

Degree name

MS, Master of Science

Committee Member

Zhiling Lan Marco D. Santambrogio

Thesis type

application/pdf

Language

  • en

Usage metrics

    Dissertations and Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC