sort.HierarchicalCluster
sort.HierarchicalCluster(match_threshold=0.5)Hierarchical clustering of images
Cluster images with the hierarchical agglomerative clustering (HAC) algorithm from scikit-learn.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| match_threshold | float | Threshold dictating how closely knit clusters should be. Must be between zero and one. | 0.5 |
Examples
>>> import numpy as np
>>> from pyseter.sort import HierarchicalCluster
>>> from numpy.random import normal
>>>
>>> cluster1 = normal(-200, 1, size=(15, 5504))
>>> cluster2 = normal(200, 1, size=(5, 5504))
>>> feature_array = np.vstack([cluster1, cluster2])
>>>
>>> hac = HierarchicalCluster(match_threshold=0.5)
>>> cluster_indices = hac.cluster_images(feature_array)
>>> len(np.unique(cluster_indices))
2Attributes
| Name | Type | Description |
|---|---|---|
| match_threshold | Threshold indicating how closely knit clusters should be. |
Notes
HierarchicalCluster works best for larger datasets, say, over 1000 images. HierarchicalCluster may be prone to false negative errors.
HierarchicalCluster uses the version of HAC with a distance threshold specified–i.e., an unknown number of clusters–complete linkage, and cosine as the distance metric.
Methods
| Name | Description |
|---|---|
| cluster_images | Cluster images |
cluster_images
sort.HierarchicalCluster.cluster_images(features)Cluster images
Cluster feature vectors according to their cosine distance from one another.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| features | np.ndarray | Array with shape (image_count, feature_count) containing the feature vector for each image. |
required |
Returns
| Name | Type | Description |
|---|---|---|
| np.ndarray | NumPy array containing integer labels for each cluster. |