Numerical Optimization as a Means to Symbolic Regression Program Synthesis

Cerny, Brian M.

Numerical Optimization as a Means to Symbolic Regression Program Synthesis

thesis

posted on 2014-04-15, 00:00 authored by Brian M. Cerny

Over the years there has been an increasing interest in probabilistically oriented Evolutionary Algorithms (EAs), but it has not been until recently that these innovative methods have been collectively recognized and achieved an independent status. By eliminating the traditionally employed genetic operators, these probabilistic EAs have been forced to adopt an alternative approach, and in the case of Estimation of Distribution Algorithms (EDAs), probabilistic graphical models have become the favored substitute. In the first major contribution of thesis, the proposal is made to utilize a probabilistic model that has been previously overlooked in the EDA literature, namely Hidden Markov Models (HMMs). But preferring not to completely abandon the biologically inspired genetic operations, the classical learning algorithms used to train HMMs are largely ignored, and instead, Differential Evolution (DE) is used to evolve the underlying constrained numerical parameters of the chosen probabilistic model. The evolved HMMs are then used to generate likely Prefix Gene Expression Programming (PGEP) chromosomes which encode candidate solutions, and thus provide feedback to guide this proposed evolutionary search process. After the algorithm is described in detail, benchmarking results on a set of Symbolic Regression (SR) problems are reported and this novel approach is compared to the original PGEP method. Besides investigating probabilistic model driven representations for SR, one other problem that has plagued Genetic Programming (GP) and its derivatives like PGEP, is also investigated herein. That being, numerical Constant Creation (CC) in SR. Given a mathematical formula expressed as a tree structure, the leaf nodes are either variables or constants. Such constants are usually unknown in SR problems, and GP, as well as many of its derivatives, lack the ability to precisely approximate these values. This is due to the inherently discrete encoding of GP-like methods which are more suited for combinatorial searches than real-valued optimization tasks. Previously, several attempts have been made to resolve this issue, and the dominant solutions have been to either embed a real-valued local optimizer or to develop additional numerically oriented genetic operators. In the second and final major contribution of this thesis, an entirely new and unified approach to SR with CC is proposed. Again, through the adoption of the robust real-valued optimization algorithm known as DE, constants and PGEP programs will be simultaneously evolved in such a way that the values of the leaf nodes will be approximated as the tree structure is itself changing. Experimental results from several SR benchmarks are presented and analyzed. The results demonstrate the feasibility of the proposed algorithm and suggest that exotic or computationally expensive methods are not necessary for successful CC.

History

Advisor

Nelson, Peter C.

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Masters

Submitted date

2011-08

Language

en

Issue date

2012-12-07

Usage metrics

Keywords

Genetic Programming Gene Expression Programming Prefix Gene Expression Programming Hidden Markov Model Symbolic Regression Differential Evolution Estimation of Distribution Algorithm Genetic Algorithm Symbolic Function Identification Evolutionary Computation Evolutionary Algorithm

Licence

In Copyright

Numerical Optimization as a Means to Symbolic Regression Program Synthesis

History

Advisor

Department

Degree Grantor

Degree Level

Submitted date

Language

Issue date

Usage metrics

Categories

Keywords

Licence

Exports