Table of Contents
VLFeat offers a hierarchical version of integer k-means, which
recursively applies vl_ikmeans to compute finer and finer
partitions. For more details see
Hierarchical Integer
k-means API reference and the Integer
k-means tutorial.
Usage
First, we generate some random data to cluster in [0,255]^2:
data = uint8(rand(2,10000) * 255) ; datat = uint8(rand(2,100000)* 255) ;
To cluster this data, we simply use vl_hikmeans:
K = 3 ; nleaves = 100 ; [tree,A] = vl_hikmeans(data,K,nleaves) ;
Here nleaves is the desired number of leaf
clusters. The algorithm terminates when there are at least
nleaves nodes, creating a tree with
depth =
floor(log(K nleaves))
To assign labels to the new data, we use vl_hikmeanspush:
AT = vl_hikmeanspush(tree,datat) ;
Tree structure
The output tree is a MATLAB structure representing the tree of
clusters:
> tree tree = K: 3 depth: 5 centers: [2x3 int32] sub: [1x3 struct]
The field centers is the matrix of the cluster centers at the
root node. If the depth of the tree is larger than 1, then the field
sub is a structure array with one entry for each cluster. Each
element is in turn a tree:
> tree.sub ans = 1x3 struct array with fields: centers sub
with a field centers for its clusters and a field
sub for its children. When there are no children, this
field is equal to the empty matrix
> tree.sub(1).sub(1).sub(1).sub(1) ans = centers: [2x3 int32] sub: []
Elkan
VLFeat supports two different implementations of k-means. While they
produce identical output, the Elkan method is sometimes faster.
The method parameters controls which method is used. Consider the case when
K=10 and our data is now 128 dimensional (e.g. SIFT descriptors):
K=10; nleaves = 1000; data = uint8(rand(128,10000) * 255); tic; [tree,A] = vl_hikmeans(data,K,nleaves,'method', 'lloyd') ; % default t_lloyd = toc tic; [tree,A] = vl_hikmeans(data,K,nleaves,'method', 'elkan') ; t_elkan = toc t_lloyd = 8.0743 t_elkan = 3.0427