sort.NetworkCluster

sort.NetworkCluster(match_threshold=0.5)

Network clustering of images

Cluster images with a simple network, where images are nodes and edges are images whose similarity score is above the match_threshold

Parameters

Name Type Description Default
match_threshold float Similarity score threshold above which two images are considered to contain the same animal. Must lie between [0.0, 1.0] 0.5

Notes

Network clustering works best with smaller datasets, say, around 1000 images.

Examples

>>> import numpy as np
>>> from pyseter.sort import NetworkCluster
>>> from sklearn.metrics.pairwise import cosine_similarity
>>> from numpy.random import normal
>>> 
>>> cluster1 = normal(-200, 1, size=(15, 5504))
>>> cluster2 = normal(200, 1, size=(5, 5504))
>>> feature_array = np.vstack([cluster1, cluster2])
>>> scores = cosine_similarity(feature_array)
>>> 
>>> nc = NetworkCluster(match_threshold=0.5)
>>> results = nc.cluster_images(scores)
>>> len(np.unique(results.cluster_idx))
2

Methods

Name Description
cluster_images Cluster images

cluster_images

sort.NetworkCluster.cluster_images(similarity, message=True)

Cluster images

Cluster images based on their similarity scores with network clustering.

Parameters

Name Type Description Default
similarity np.ndarray Array with shape (image_count, image_count) indicating the similarity between each pair of images. required
message bool Should a message about potential false positives be printed to the console? True

Returns

Name Type Description
results ClusterResults Object of type pyster.ClusterResult. Integer labels for the cluster assignment of each image can be accessed with results.cluster_idx.