Modeling Big Data Variety with Graph Mining Techniques
thesisposted on 2016-10-29, 00:00 authored by Xiangnan Kong
Graphs are ubiquitous and have become increasingly important in modeling diverse kinds of objects. In many real-world applications, instances are not represented as feature vectors, but as graphs with complex structures, e.g., chemical compounds, program flows, XML web documents and brain networks. One central issue in graph mining research is graph classification, which has a wide variety of real world applications, e.g., drug activity predictions, toxicology tests and kinase inhibitions. There are some major challenges in real-world graph classification problems as follows: 1) Learning from graphs with multiple labels:} For example, a chemical compound can inhibit the activities of multiple types of kinases, e.g., ATPase and MEK kinase; One drug molecular can have anti-cancer efficacies on multiple types of cancers. 2) Learning from a small number of labeled graphs: In many real world applications, the labels of graph data are very expensive or difficult to obtain. Creating a large training dataset can be too expensive, time-consuming or even infeasible. For example, in molecular medicine, it requires time, efforts and excessive resources to test drugs' anti-cancer efficacies by pre-clinical studies and clinical trials, while there are often copious amounts of unlabeled drugs or molecules available from various sources. 3) Learning from uncertain graphs: For example, in neuroimaging, the functional connectivities among different brain regions are highly uncertain. In such applications, each human brain can be represented as an uncertain graph, instead of a certain graph. In this thesis, we explore four different settings of graph classification: multi-label setting, semi-supervised setting, active learning setting, and uncertain graph setting. In the multi-label setting, each graph object can be assigned with multiple labels. In semi-supervise setting and active learning setting, we explore two different settings to reduce the labeling costs in graph classification problems. In uncertain graph setting, we explore how to incorporate the uncertainty information in the graph structure for graph classification problems.