posted on 2025-05-01, 00:00authored byHarshal Jagdishbhai Hirpara
Head and Neck Cancer (HNC) accounts for approximately 3–4% of all cancers worldwide, affecting regions such as the lips, tongue, throat, larynx, nose, and salivary glands. The primary risk factors for HNC include excessive tobacco and alcohol consumption or infection with Human Papillomavirus (HPV). Treatment typically follows a sequential three-stage approach, beginning with Definitive Surgery (DS), followed by Inductive Chemotherapy (IC), and concluding with either Radiotherapy (RT) alone or Radiotherapy with Concurrent Chemotherapy (RT/CC). However, treatment plans are highly patient-specific, and specific steps may be omitted based on the patient's condition and multidisciplinary medical decisions.
Determining an optimal Dynamic Treatment Regime (DTR) requires extensive collaboration among specialists, often involving multiple iterations to reach a consensus. To address this challenge, this study proposes a Deep Reinforcement Learning (DRL) framework to automate DTR planning by leveraging historical patient data. Using guided Policy Gradient (PG) methodologies, two approaches—Regularization and Fine Tuning—were explored, with the Behavior Cloning (BC) model serving as the guiding framework. Guidance ensures that the DRL models align with clinical decisions by penalizing deviations only when significant discrepancies occur.
This study utilizes data from 676 patients diagnosed with HNC, all receiving at least a radiation therapy (RT) at MD Anderson Cancer Center (MDACC) between 2010 and 2021. This data consists of patients' medical records and includes clinical, diagnostic, treatment-related, and patient-reported scores (PRS) information. The data was de-identified and collected at the University of Texas under Institutional IRBs and transferred to the University of Illinois Chicago (UIC) under a Material Transfer Agreement. As per the notice of determination of human subject research by the UIC Office for the Protection of Research Subjects, this dataset does not meet the definition of human subject research at UIC.
Further, different variations of the original dataset were created to analyze the impact of PRS on symptoms such as fatigue, pain, and nausea. These include:
A dataset where PRS was excluded before determining any treatment.
A modified dataset where individual PRS values were replaced with high- or low-burden clusters using the Symptom Burden Model (SBM) API was developed at the University of Iowa.
Therefore, this study presents a comprehensive analysis of DRL-based DTR models trained on all datasets, demonstrating the potential of DRL in optimizing personalized HNC treatment plans while reducing the need for extensive manual decision-making.