Integer K-Means clustering. More...

#include "generic.h"
#include "random.h"

Data Structures
struct	VlIKMFilt
	IKM quantizer. More...

Typedefs
typedef vl_int32	vl_ikmacc_t

Enumerations
enum	VlIKMAlgorithms { VL_IKM_LLOYD, VL_IKM_ELKAN }
	IKM algorithms. More...

Functions
Create and destroy
VlIKMFilt *	vl_ikm_new (int method)
	Create a new IKM quantizer. More...

void	vl_ikm_delete (VlIKMFilt *f)
	Delete IKM quantizer. More...

Process data
void	vl_ikm_init (VlIKMFilt f, vl_ikmacc_t const centers, vl_size M, vl_size K)

void	vl_ikm_init_rand (VlIKMFilt *f, vl_size M, vl_size K)

void	vl_ikm_init_rand_data (VlIKMFilt f, vl_uint8 const data, vl_size M, vl_size N, vl_size K)

int	vl_ikm_train (VlIKMFilt f, vl_uint8 const data, vl_size N)
	Train clusters. More...

void	vl_ikm_push (VlIKMFilt f, vl_uint32 asgn, vl_uint8 const *data, vl_size N)
	Project data to clusters. More...

vl_uint	vl_ikm_push_one (vl_ikmacc_t const centers, vl_uint8 const data, vl_size M, vl_size K)
	Project one datum to clusters. More...

Retrieve data and parameters
vl_size	vl_ikm_get_ndims (VlIKMFilt const *f)
	Get data dimensionality. More...

vl_size	vl_ikm_get_K (VlIKMFilt const *f)
	Get the number of centers K. More...

int	vl_ikm_get_verbosity (VlIKMFilt const *f)
	Get verbosity level. More...

vl_size	vl_ikm_get_max_niters (VlIKMFilt const *f)
	Get maximum number of iterations. More...

vl_ikmacc_t const *	vl_ikm_get_centers (VlIKMFilt const *f)
	Get maximum number of iterations. More...

Set parameters
void	vl_ikm_set_verbosity (VlIKMFilt *f, int verb)
	Set verbosity level. More...

void	vl_ikm_set_max_niters (VlIKMFilt *f, vl_size max_niters)
	Set maximum number of iterations. More...

Detailed Description

Integer K-means (IKM) is an implementation of K-means clustering (or Vector Quantization, VQ) for integer data. This is particularly useful for clustering large collections of visual descriptors.

Use the function vl_ikm_new() to create a IKM quantizer. Initialize the IKM quantizer with K clusters by vl_ikm_init() or similar function. Use vl_ikm_train() to train the quantizer. Use vl_ikm_push() or vl_ikm_push_one() to quantize new data.

Given data \(x_1,\dots,x_N\in R^d\) and a number of clusters \(K\), the goal is to find assignments \(a_i\in\{1,\dots,K\},\) and centers \(c_1,\dots,c_K\in R^d\) so that the expected distortion

\[ E(\{a_{i}, c_j\}) = \frac{1}{N} \sum_{i=1}^N d(x_i, c_{a_i}) \]

is minimized. Here \(d(x_i, c_{a_i})\) is the distortion, i.e. the cost we pay for representing \( x_i \) by \( c_{a_i} \). IKM uses the squared distortion \(d(x,y)=\|x-y\|^2_2\).

Algorithms

Initialization

Most K-means algorithms are iterative and needs an initialization in the form of an initial choice of the centers \(c_1,\dots,c_K\). We include the following options:

User specified centers (::vl_ikm_init);
Random centers (::vl_ikm_init_rand);
Centers from K randomly selected data points (::vl_ikm_init_rand_data).

Lloyd

The Lloyd (also known as Lloyd-Max and LBG) algorithm iteratively:

Fixes the centers, optimizing the assignments (minimizing by exhaustive search the association of each data point to the centers);
Fixes the assignments and optimizes the centers (by descending the distortion error function). For the squared distortion, this step is in closed form.

This algorithm is not particularly efficient because all data points need to be compared to all centers, for a complexity \(O(dNKT)\), where T is the total number of iterations.

Elkan

The Elkan algorithm is an optimized variant of Lloyd. By making use of the triangle inequality, many comparisons of data points and centers are avoided, especially at later iterations. Usually 4-5 times less comparisons than Lloyd are preformed, providing a dramatic speedup in the execution time.

Author: Brian Fulkerson; Andrea Vedaldi

Typedef Documentation

◆ vl_ikmacc_t

typedef vl_int32 vl_ikmacc_t

IKM accumulator data type

Enumeration Type Documentation

◆ VlIKMAlgorithms

enum VlIKMAlgorithms

Enumerator
VL_IKM_LLOYD	Lloyd algorithm
VL_IKM_ELKAN	Elkan algorithm

Function Documentation

◆ vl_ikm_delete()

void vl_ikm_delete ( VlIKMFilt * f )

Parameters

f	IKM quantizer.

◆ vl_ikm_get_centers()

vl_ikmacc_t const* vl_ikm_get_centers ( VlIKMFilt const * f )

Parameters

f	IKM filter.

Returns: maximum number of iterations.

◆ vl_ikm_get_K()

vl_size vl_ikm_get_K ( VlIKMFilt const * f )

Parameters

f	IKM filter.

Returns: number of centers K.

◆ vl_ikm_get_max_niters()

vl_size vl_ikm_get_max_niters ( VlIKMFilt const * f )

Parameters

f	IKM filter.

Returns: maximum number of iterations.

◆ vl_ikm_get_ndims()

vl_size vl_ikm_get_ndims ( VlIKMFilt const * f )

Parameters

f	IKM filter.

Returns: data dimensionality.

◆ vl_ikm_get_verbosity()

int vl_ikm_get_verbosity ( VlIKMFilt const * f )

Parameters

f	IKM filter.

Returns: verbosity level.

◆ vl_ikm_new()

VlIKMFilt* vl_ikm_new ( int method )

Parameters

method Clustering algorithm.

Returns: new IKM quantizer.

The function allocates initializes a new IKM quantizer to operate based algorithm method.

method has values in the enumerations VlIKMAlgorithms.

◆ vl_ikm_push()

void vl_ikm_push	(	VlIKMFilt *	f,
		vl_uint32 *	asgn,
		vl_uint8 const *	data,
		vl_size	N
	)

Parameters

f	IKM quantizer.
asgn	Assignments (out).
data	data.
N	number of data (N `>=` 1).

The function projects the data data on the integer K-means clusters specified by the IKM quantizer f. Notice that the quantizer must be initialized.

◆ vl_ikm_push_one()

vl_uint vl_ikm_push_one	(	vl_ikmacc_t const *	centers,
		vl_uint8 const *	data,
		vl_size	M,
		vl_size	K
	)

Parameters

centers	centers.
data	datum to project.
K	number of centers.
M	dimensionality of the datum.

Returns: the cluster index.

The function projects the specified datum data on the clusters specified by the centers centers.

◆ vl_ikm_set_max_niters()

void vl_ikm_set_max_niters	(	VlIKMFilt *	f,
		vl_size	max_niters
	)

Parameters

f	IKM filter.
max_niters	maximum number of iterations.

◆ vl_ikm_set_verbosity()

void vl_ikm_set_verbosity	(	VlIKMFilt *	f,
		int	verb
	)

Parameters

f	IKM filter.
verb	verbosity level.

◆ vl_ikm_train()

int vl_ikm_train	(	VlIKMFilt *	f,
		vl_uint8 const *	data,
		vl_size	N
	)

Parameters

f	IKM quantizer.
data	data.
N	number of data (N `>=` 1).

Returns: -1 if an overflow may have occurred.

VLFeat.org

Data Structures

Typedefs

Enumerations

Functions

Detailed Description

Algorithms

Initialization

Lloyd

Elkan

Typedef Documentation

◆ vl_ikmacc_t

Enumeration Type Documentation

◆ VlIKMAlgorithms

Function Documentation

◆ vl_ikm_delete()

◆ vl_ikm_get_centers()

◆ vl_ikm_get_K()

◆ vl_ikm_get_max_niters()

◆ vl_ikm_get_ndims()

◆ vl_ikm_get_verbosity()

◆ vl_ikm_new()

◆ vl_ikm_push()

◆ vl_ikm_push_one()

◆ vl_ikm_set_max_niters()

◆ vl_ikm_set_verbosity()

◆ vl_ikm_train()