quickshift.h implements an image segmentation algorithm based on the quick shift clustering algorithm [29] .
Overview
Quick shift [29] is a fast mode seeking algorithm, similar to mean shift. The algorithm segments an RGB image (or any image with more than one channel) by identifying clusters of pixels in the joint spatial and color dimensions. Segments are local (superpixels) and can be used as a basis for further processing.
Given an image, the algorithm calculates a forest of pixels whose branches are labeled with a distance value (vl_quickshift_get_parents, vl_quickshift_get_dists). This specifies a hierarchical segmentation of the image, with segments corresponding to subtrees. Useful superpixels can be identified by cutting the branches whose distance label is above a given threshold (the threshold can be either fixed by hand, or determined by cross validation).
Parameter influencing the algorithm are:
- Kernel size. The pixel density and its modes are estimated by using a Parzen window estimator with a Gaussian kernel of the specified size (vl_quickshift_set_kernel_size). The larger the size, the larger the neighborhoods of pixels considered.
- Maximum distance. This (vl_quickshift_set_max_dist) is the maximum distance between two pixels that the algorithm considers when building the forest. In principle, it can be infinity (so that a tree is returned), but in practice it is much faster to consider only relatively small distances (the maximum distance can be set to a small multiple of the kernel size).
Usage
- Create a new quick shift object (vl_quickshift_new). The object can be reused for multiple images of the same size.
- Configure quick shift by setting the kernel size (vl_quickshift_set_kernel_size) and the maximum gap (vl_quickshift_set_max_dist). The latter is in principle not necessary, but useful to speedup processing.
- Process an image (vl_quickshift_process).
- Retrieve the parents (vl_quickshift_get_parents) and the distances (vl_quickshift_get_dists). These can be used to segment the image in superpixels.
- Delete the quick shift object (vl_quickshift_delete).
Technical details
For each pixel (x,y), quick shift regards \( (x,y,I(x,y)) \) as a sample from a d + 2 dimensional vector space. It then calculates the Parzen density estimate (with a Gaussian kernel of standard deviation \( \sigma \))
\[ E(x,y) = P(x,y,I(x,y)) = \sum_{x'y'} \frac{1}{(2\pi\sigma)^{d+2}} \exp \left( -\frac{1}{2\sigma^2} \left[ \begin{array}{c} x - x' \\ y - y' \\ I(x,y) - I(x',y') \\ \end{array} \right] \right) \]
Then quick shift construct a tree connecting each image pixel to its nearest neighbor which has greater density value. Formally, write \( (x',y') >_P (x,y) \) if, and only if,
\[ P(x',y',I(x',y')) > P(x,y,I(x,y))}. \]
Each pixel (x, y) is connected to the closest higher density pixel parent(x, y) that achieves the minimum distance in
\[ \mathrm{dist}(x,y) = \mathrm{min}_{(x',y') > P(x,y)} \left( (x - x')^2 + (y - y')^2 + \| I(x,y) - I(x',y') \|_2^2 \right). \]