University of Illinois Chicago
Browse

Analysis of presence-only data via semi-supervised learning approaches

Download (159.18 kB)
journal contribution
posted on 2013-11-22, 00:00 authored by Junhui Wang, Yixin Fang
Presence-only data occur in classification, which consist of a sample of observations from presence class and a large number of background observations with unknown presence/absence. Since absence data are generally unavailable, conventional semisupervised learning approaches are no longer appropriate as they tend to degenerate and assign all observations to presence class. In this article, we propose a generalized class balance constraint, which can be equipped with semi-supervised learning approaches to prevent them from degeneration. Furthermore, to circumvent the difficulty of model tuning with presence-only data, a selection criterion based on classification stability is developed, which measures the robustness of any given classification algorithm against the sampling randomness. The effectiveness of the proposed approach is demonstrated through a variety of simulated examples, along with an application to gene function prediction.

History

Publisher Statement

NOTICE: This is the author’s version of a work that was accepted for publication in Computational Statistics and Data Analysis. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Computational Statistics and Data Analysis, Vol 59, (2012) DOI: 10.1016/j.csda.2012.10.007

Publisher

Elsevier

Language

  • en_US

issn

0167-9473

Issue date

2013-03-01

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC