sort.HierarchicalCluster

sort.HierarchicalCluster(match_threshold=0.5)

Hierarchical clustering of images

Cluster images with the hierarchical agglomerative clustering (HAC) algorithm from scikit-learn.

Parameters

Name	Type	Description	Default
match_threshold	float	Threshold dictating how closely knit clusters should be. Must be between zero and one.	`0.5`

Examples

>>> import numpy as np
>>> from pyseter.sort import HierarchicalCluster
>>> from numpy.random import normal
>>> 
>>> cluster1 = normal(-200, 1, size=(15, 5504))
>>> cluster2 = normal(200, 1, size=(5, 5504))
>>> feature_array = np.vstack([cluster1, cluster2])
>>> 
>>> hac = HierarchicalCluster(match_threshold=0.5)
>>> cluster_indices = hac.cluster_images(feature_array)
>>> len(np.unique(cluster_indices))
2

Attributes

Name	Type	Description
match_threshold		Threshold indicating how closely knit clusters should be.

Notes

HierarchicalCluster works best for larger datasets, say, over 1000 images. HierarchicalCluster may be prone to false negative errors.

HierarchicalCluster uses the version of HAC with a distance threshold specified–i.e., an unknown number of clusters–complete linkage, and cosine as the distance metric.

Methods

Name	Description
cluster_images	Cluster images

cluster_images

sort.HierarchicalCluster.cluster_images(features)

Cluster images

Cluster feature vectors according to their cosine distance from one another.

Parameters

Name	Type	Description	Default
features	np.ndarray	Array with shape `(image_count, feature_count)` containing the feature vector for each image.	required

Returns

Name	Type	Description
	np.ndarray	NumPy array containing integer labels for each cluster.