Cluster Analysis for Machine Learning

Last modified: 2023-08-20

Clustering Machine Learning

We can find the number of clusters using methods such as K-means.

Find Optimal Number of Clusters

K-means & Elbow Curve

Reference: https://www.kaggle.com/code/jonbown/ai-ctf-submissions?scriptVersionId=105606691&cellId=39

We may find the optimal number of clusters by using K-means algorithm and observing the Elbow graph.

import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

clusters = np.load("example.npy")

# specify the range of the number of clusters
K = range(1, 10)

distortions = []
for i in K:
	kmeans = KMeans(n_clusters=i)
	kmeans.fit(clusters)
	distortions.append(kmeans.inertia_)

plt.plot(K, distortions)
plt.xlabel("Number of clusters")
plt.ylabel("Distortion")

Seeing the output graph, the last point where the distortion (or inertia) drops sharply may be the optimal number of clusters.