University of Illinois Chicago
Browse

Relational Causal Discovery with Cycles

thesis
posted on 2022-12-01, 00:00 authored by Ragib Ahsan
Most of the tools and methods developed for causal discovery rely on a graphical representation based on Bayesian networks which assumes independent and identically distributed (i.i.d) instances. Probabilistic relational models have been developed to relax this assumption. The key advantage of these relational models is that they can represent systems involving multiple types of entities interacting with each other with some probabilistic dependence. Causal reasoning over such relational systems is key to understanding many real-world phenomena, such as social influence. Influence in complex dynamic systems is often mutual and represented by a feedback loop or cycle in the relational model. Identifying mutual influence in relational models is of great interest in the research community. For example, social scientists and marketing experts are interested in studying the social dynamics between people and products in social networks. However, there is a lack of available methods for discovering mutual influence or cycles in complex relational systems. Most of the works on causal structure learning assume that the observational data samples are identically and independently distributed (i.i.d) and as a result are not directly applicable to relational models. At the same time, existing causal discovery algorithms for relational causal models assume acyclicity and rely on prior domain knowledge for conditional independence tests. The primary objective of this thesis is to address the deficiencies of existing relational causal discovery approaches by developing tools and methods for practical application of causal discovery on complex relational systems with feedback loops or cycles. In my thesis, I develop 𝜎-abstract ground graph (𝜎-AGG), a sound and complete representation for cyclic relational causal models which can capture conditional independence relationships consistent across all possible instantiations of the model by an operator called relational sigma-separation. Based on this new representation and theoretical guarantees, I define relational acyclification- an important property of cyclic relational causal models that helps identifying such models from observational data. I prove that under certain assumptions and conditions, the existing relational causal discovery algorithm (RCD) is sound and complete for cyclic relational causal models. Experimental evaluations conducted on publicly available datasets and synthetically generated datasets support my theoretical contributions. The state-of-the-art conditional independence (CI) test for relational data depends on domain knowledge about the type of dependence and lacks convergence guarantees which makes them unsuitable for large-scale real-world applications. In my thesis, I formalize a general notion of relational dependence and develop a consistent CI test method based on kernel mean embeddings that can capture complex dependence functions over node neighborhoods. Causal structure learning is NP-hard and constraint-based algorithms reduce this complexity by learning the local structure of each variable first. I conducted a comprehensive study of sampling strategies for relational classification which can help identify the important variables in local structures. I introduced a novel sampling method based on Weisfeiler-Lehman isomorphism that provides competitive predictive accuracy for one-shot active learning in relational classification.

History

Advisor

Zheleva, Elena

Chair

Zheleva, Elena

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

  • Doctoral

Degree name

PhD, Doctor of Philosophy

Committee Member

DiEugenio, Barbara Ziebart, Brian Tulabandhula, Theja Arbour, David

Submitted date

December 2022

Thesis type

application/pdf

Language

  • en

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC