Documentation - C API

pegasos.h File Reference

PEGASOS SVM. More...

#include "generic.h"

Go to the source code of this file.


Detailed Description

pegasos.h provides basic impelmentation of the PEGASOS [1] SVM solver.

Overview

PEGASOS solves the linear SVM learning problem

\[ \min_{w} \frac{\lambda}{2} \|w\|^2 + \frac{1}{m} \sum_{i=1}^n \ell(w; (x_i,y_i)) \]

where $ x_i $ are data vectors in $ \mathbb{R}^d $, $ y_i \in \{-1,1\} $ are binary labels, $ \lambda > 0 $ is the regularization parameter, and

\[ \ell(w;(x_i,y_i) = \max\{0, 1 - y_i \langle w,x_i\rangle\}. \]

is the hinge loss. The result of the optimization is a model $ w \in \mathbb{R}^d $ that yields the decision function

\[ F(x) = \operatorname{sign} \langle w, x\rangle. \]

It is well known that the hinge loss is a convex upper bound of the 01-loss of the decision function:

\[ \ell(w;(x,y)) \geq \frac{1}{2}(1 - y F(x)). \]

PEGASOS is accessed by calling vl_pegasos_train_binary_svm_d or vl_pegasos_train_binary_svm_f, operating respectively on double or float data.

Bias

PEGASOS SVM formulation does not incorporate a bias. To learn an SVM with bias, the vector $ x $ can be optionally extended by a constant component $ B $. In this case, the model $ w $ is $d + 1 $ dimensional and the decision function is $ F(x) = \operatorname{sign} (\langle w_{1:d}, x\rangle+ w_{d+1} B) $. If B is large enough, the weight $ w_{d+1} $ remains small and it has small contribution in the regularization term $ \| w \|^2 $; however, a large $ B $ makes the optimization harder.

Non-linear kernels

PEGASOS can be extended to work with kernels, but the algorithm is not particularly efficient in this setting. A preferable solution may be to compute an explicit feature map representing the kernel.

Let $ k(x,y) $ be a kernel. A feature map is a function $ \Psi(x) $ such that $ k(x,y) = \langle \Psi(x), \Psi(y) \rangle $. Using this representation, that exists for any positive definite kernel as long as one accepts infinite dimensional feature maps, non-linear SVM learning writes:

\[ \min_{w} \frac{\lambda}{2} \|w\|^2 + \frac{1}{m} \sum_{i=1}^n \ell(w; (\Psi(x)_i,y_i)). \]

Thus the only difference with the linear case is that $ \Psi(x) $ is used in place of $ x $.

The ability of computing a feature-map representation depends on the kernel. A general solution is to take the (incomplete) Cholesky decomposition $ V^\top V $ of the kernel matrix $ K = [k(x_i,x_j)] $ (in this case $ \Psi(x_i) $ is the i-th columns of V). Alternatively, for additive kernels (e.g. intersection, Chi2) homkermap.h can be used.

Algorithm

PEGASOS is a stochastic subgradient optmizer with automatically tuned learning rate. At the t-iteration, the algorithm:

  • Samples uniformly at random as subset $ A_t$ of k of training pairs $(x,y)$ from the m pairs provided.
  • Compute a subgradient $ \nabla_t $ of the function $ E_t(w) = \frac{1}{2}\|w\|^2 + \frac{1}{k} \sum_{(x,y) \in A_t} \ell(w;(x,y)) $.
  • Do a step $ w_{t+1/2} = w_t - \alpha_t \nalba_t $ with learning rate $ \alpha_t = 1/(\eta t) $.
  • Back project on the hypersphere of radius $ \sqrt{\lambda} $ to obtain the next model estimate $ w_{t+1} $:

    \[ w_t = \min\{1, \sqrt{\lambda}/\|w\|\} w_{t+1/2} \]

References

[1] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-GrAdient SOlver for SVM. In Proc. ICML, 2007.

Author:
Andrea Vedaldi

Definition in file pegasos.h.