Heterogeneous Learning and Its Applications
thesisposted on 2013-10-24, 00:00 authored by Xiaoxiao Shi
With the rapid growth of big data mining, multiple related data sources containing different types of features may be available for a given task. For instance, users’ profiles can be used to build recommendation systems; in addition, a model can also use users’ historical behaviors and social networks to infer users’ interests on related products. We argue that it is desirable to collectively use any available multiple heterogeneous data sources in order to build effective learning models. We call this framework heterogeneous learning. There are mainly two challenges in heterogeneous learning as follows: (1) Learning from data with different statistical properties. For example, the data from different data sources violate the iid assumption, or the data from different sources have different feature spaces, or the data have different prediction labels (different posterior), or the combination of the above cases. (2) Learning from data with different structures. For example, some of the data sources contain traditional vector-based features (e.g., user profiles), while others are graph relational data (e.g., social networks), or the data sources are chemical graphs with different structures. In this thesis, we explore the above challenges from the views of supervised learning, unsupervised learning and feature projection respectively, and apply them to solve real world problems. These real world applications include drug efficiency prediction, document classification, image classification, movie rating prediction, chemical graph classification, collective classification, and several datasets from the UCI database. It shows that heterogeneous learning improves the learning accuracy significantly in some applications. For example, in the task of drug efficiency prediction, heterogeneous learning can reduce the error rate by over 50% by using a projection approach.