Modeling Big Data Variety with Graph Mining Techniques

Kong, Xiangnan

Kong_Xiangnan.pdf (6.64 MB)

Modeling Big Data Variety with Graph Mining Techniques

thesis

posted on 2016-10-29, 00:00 authored by Xiangnan Kong

Graphs are ubiquitous and have become increasingly important in modeling diverse kinds of objects. In many real-world applications, instances are not represented as feature vectors, but as graphs with complex structures, e.g., chemical compounds, program flows, XML web documents and brain networks. One central issue in graph mining research is graph classification, which has a wide variety of real world applications, e.g., drug activity predictions, toxicology tests and kinase inhibitions. There are some major challenges in real-world graph classification problems as follows: 1) Learning from graphs with multiple labels:} For example, a chemical compound can inhibit the activities of multiple types of kinases, e.g., ATPase and MEK kinase; One drug molecular can have anti-cancer efficacies on multiple types of cancers. 2) Learning from a small number of labeled graphs: In many real world applications, the labels of graph data are very expensive or difficult to obtain. Creating a large training dataset can be too expensive, time-consuming or even infeasible. For example, in molecular medicine, it requires time, efforts and excessive resources to test drugs' anti-cancer efficacies by pre-clinical studies and clinical trials, while there are often copious amounts of unlabeled drugs or molecules available from various sources. 3) Learning from uncertain graphs: For example, in neuroimaging, the functional connectivities among different brain regions are highly uncertain. In such applications, each human brain can be represented as an uncertain graph, instead of a certain graph. In this thesis, we explore four different settings of graph classification: multi-label setting, semi-supervised setting, active learning setting, and uncertain graph setting. In the multi-label setting, each graph object can be assigned with multiple labels. In semi-supervise setting and active learning setting, we explore two different settings to reduce the labeling costs in graph classification problems. In uncertain graph setting, we explore how to incorporate the uncertainty information in the graph structure for graph classification problems.

History

Advisor

Yu, Philip S.

Department

Computer Science

Degree Grantor

University of Illinois at Chicago

Degree Level

Doctoral

Committee Member

Liu, Bing Lillis, John Wang, Junhui Ragin, Ann B.

Submitted date

2014-08

Language

en

Issue date

2014-10-28

Usage metrics

Keywords

Graph Mining Data Mining Big Data Data Variety Subgraph Pattern Feature Selection Uncertain Data Drug Discovery Brain Network

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Modeling Big Data Variety with Graph Mining Techniques

History

Advisor

Department

Degree Grantor

Degree Level

Committee Member

Submitted date

Language

Issue date

Usage metrics

Categories

Keywords

Licence

Exports