Most of the tools and methods developed for causal discovery rely on a graphical representation based on Bayesian networks which assumes independent and identically distributed (i.i.d) instances. Probabilistic relational models have been developed to relax this assumption. The key advantage of these relational models is that they can represent systems involving multiple types of entities interacting with each other with some probabilistic dependence. Causal reasoning over such relational systems is key to understanding many real-world phenomena, such as social influence. Influence in complex dynamic systems is often mutual and represented by a feedback loop or cycle in the relational model. Identifying mutual influence in relational models is of great interest in the research community. For example, social scientists and marketing experts are interested in studying the social dynamics between people and products in social networks. However, there is a lack of available methods for discovering mutual influence or cycles in complex relational systems. Most of the works on causal structure learning assume that the observational data samples are identically and independently distributed (i.i.d) and as a result are not directly applicable to relational models. At the same time, existing causal discovery algorithms for relational causal models assume acyclicity and rely on prior domain knowledge for conditional independence tests.
The primary objective of this thesis is to address the deficiencies of existing relational causal discovery approaches by developing tools and methods for practical application of causal discovery on complex relational systems with feedback loops or cycles. In my thesis, I develop 𝜎-abstract ground graph (𝜎-AGG), a sound and complete representation for cyclic relational causal models which can capture conditional independence relationships consistent across all possible instantiations of the model by an operator called relational sigma-separation. Based on this new representation and theoretical guarantees, I define relational acyclification- an important property of cyclic relational causal models that helps identifying such models from observational data. I prove that under certain assumptions and conditions, the existing relational causal discovery algorithm (RCD) is sound and complete for cyclic relational causal models. Experimental evaluations conducted on publicly available datasets and synthetically generated datasets support my theoretical contributions. The state-of-the-art conditional independence (CI) test for relational data depends on domain knowledge about the type of dependence and lacks convergence guarantees which makes them unsuitable for large-scale real-world applications. In my thesis, I formalize a general notion of relational dependence and develop a consistent CI test method based on kernel mean embeddings that can capture complex dependence functions over node neighborhoods. Causal structure learning is NP-hard and constraint-based algorithms reduce this complexity by learning the local structure of each variable first. I conducted a comprehensive study of sampling strategies for relational classification which can help identify the important variables in local structures. I introduced a novel sampling method based on Weisfeiler-Lehman isomorphism that provides competitive predictive accuracy for one-shot active learning in relational classification.
History
Advisor
Zheleva, Elena
Chair
Zheleva, Elena
Department
Computer Science
Degree Grantor
University of Illinois at Chicago
Degree Level
Doctoral
Degree name
PhD, Doctor of Philosophy
Committee Member
DiEugenio, Barbara
Ziebart, Brian
Tulabandhula, Theja
Arbour, David