posted on 2012-10-02, 00:00authored byXiangnan Kong, Philip S. Yu
Graph classification has been showing critical importance in a wide variety
of applications, e.g. drug activity predictions and toxicology analysis. Current research
on graph classification focuses on single-label settings. However, in many applications,
each graph data can be assigned with a set of multiple labels simultaneously. Extract-
ing good features using multiple labels of the graphs becomes an important step before
graph classification. In this paper, we study the problem of multi-label feature selec-
tion for graph classification and propose a novel solution, called gMLC, to efficiently
search for optimal subgraph features for graph objects with multiple labels. Different
from existing feature selection methods in vector spaces which assume the feature set
is given, we perform multi-label feature selection for graph data in a progressive way
together with the subgraph feature mining process. We derive an evaluation criterion
to estimate the dependence between subgraph features and multiple labels of graphs.
Then a branch-and-bound algorithm is proposed to efficiently search for optimal sub-
graph features by judiciously pruning the subgraph search space using multiple labels.
Empirical studies demonstrate that our feature selection approach can effectively boost
multi-label graph classification performances and is more efficient by pruning the sub-
graph search space using multiple labels.
Funding
This work is supported in part by NSF through grants IIS
0905215, DBI-0960443, OISE-0968341 and OIA-0963278.
History
Publisher Statement
Post print version of article may differ from published version. The original publication is available at springerlink.com; DOI:10.1007/s10115-011-0407-3