Mining Large Graphs
thesisposted on 2013-06-28, 00:00 authored by Yuchen Zhao
Recently, there is an increasing need for mining graphs with the rapidly growing social networks, Internet applications and communication networks. Among all these real-world applications, graphs are ubiquitous and contain tremendous useful information in every aspect. In this thesis, we focus on studying graph structures and apply the knowledge from graph structures to a number of fundamental data mining tasks. In this thesis, we propose a hash-based compression framework to efficiently and effectively cluster graph objects in the stream scenario. We then extend it to the graph clustering problem with side information. We propose a novel optimization framework DMO, which can dynamically optimize the weights of graph distance and side information distance metrics. The hash-based compression framework consumes constant storage spaces and the mining process can be scalable to massive graphs with side attributes. We then study the graph structures from another perspective, i.e., positive and unlabeled learning in graphs. We derive an evaluation criterion to estimate the dependency between structural features and labels, and then propose an integrated approach that concurrently updates both graph feature selection and class label assignment. By using structural features from graph objects, the experimental results shows that the proposed integrated framework significantly outperforms the previous methods. As graph structures are very useful for understanding the nature of graphs, we further extend our analysis to online social networks. We explore five social principles and concepts that represent a variety of network characteristics and quantify their relations with social roles and statuses. We propose a novel probabilistic model SRS, which can integrate both the local social factors of individual users and network influence via neighbors in a principled way.