sort.HierarchicalCluster

sort.HierarchicalCluster(match_threshold=0.5)

Hierarchical clustering of images

Cluster images with the hierarchical agglomerative clustering (HAC) algorithm from scikit-learn.

Parameters

Name Type Description Default
match_threshold float Threshold dictating how closely knit clusters should be. Must be between zero and one. 0.5

Examples

>>> import numpy as np
>>> from pyseter.sort import HierarchicalCluster
>>> from numpy.random import normal
>>> 
>>> cluster1 = normal(-200, 1, size=(15, 5504))
>>> cluster2 = normal(200, 1, size=(5, 5504))
>>> feature_array = np.vstack([cluster1, cluster2])
>>> 
>>> hac = HierarchicalCluster(match_threshold=0.5)
>>> cluster_indices = hac.cluster_images(feature_array)
>>> len(np.unique(cluster_indices))
2

Attributes

Name Type Description
match_threshold Threshold indicating how closely knit clusters should be.

Notes

HierarchicalCluster works best for larger datasets, say, over 1000 images. HierarchicalCluster may be prone to false negative errors.

HierarchicalCluster uses the version of HAC with a distance threshold specified–i.e., an unknown number of clusters–complete linkage, and cosine as the distance metric.

Methods

Name Description
cluster_images Cluster images

cluster_images

sort.HierarchicalCluster.cluster_images(features)

Cluster images

Cluster feature vectors according to their cosine distance from one another.

Parameters

Name Type Description Default
features np.ndarray Array with shape (image_count, feature_count) containing the feature vector for each image. required

Returns

Name Type Description
np.ndarray NumPy array containing integer labels for each cluster.