Bayesian Network Hybrid Learning Using a Parent Reducing Site-specific Mutation Rate Genetic Algorithm

2016-10-18T00:00:00Z (GMT) by Carlo Contaldi
Bayesian networks constitute a powerful framework for probabilistic reasoning and expert elicitation, capable of representing inner relationships underlying any kind of phenomenon in both causal and diagnostic directions. Motivated by the fact that Bayesian networks have been extensively used in a variety of research domains, I focused this thesis work on unsupervised Bayesian network structure learning: it is an ambitious as challenging task that, if accomplished, would pave the path for Bayesian modeling and would therefore contribute to provide further insight into countless state-of-the-art research topics. The problem of Bayesian network structure learning can be formalized as a search task and, from its inception back to 1980s, it has been tackled by means of two distinct strategies or their hybrid combination: a Constraint-Based method operates by progressively reducing the search space, whereas the Search and Score generic approach explores the search space guided by some knowledge-driven metric. This thesis work aims at providing a method able to learn the structure of the Bayesian network underlying a set of data samples, by focusing on problems with a limited amount of available data. In particular, the main contribution of this work is a Hybrid learning algorithm able first to reliably reduce the search space and then to exhaustively explore it, by taking advantage of data-informed expedients as well. The proposed approach involves a parameterized Genetic Algorithm in order to pursue the task: this metaheuristic has been chosen because of its efficient global search capabilities even across a very large search space and because of its flexibility and adaptability; on the other hand it consistently suffers from time and space complexity relatively to other search methods in the literature, and moreover its performance is heavily influenced by the choice of a large set of parameters. The research covered in this work is concerned with designing a series of hybrid methods on a build-up basis: they are provided with enhancements already existing in the literature but not yet applied to the Bayesian network structure learning topic and also with novel improvements able to further restrict the search space during the evolutionary process, in a data-driven manner. In the experimental chapter of this thesis it is possible to ascertain how presented algorithms allow to better address time and space complexity, sensitivity to parameters setting issues as well as the problem of data fragmentation, with the advantage of higher performances in some cases.