K-Means Clustering

Unsupervised learning: group similar data points into clusters

Speed:

Data Generation

Initial Centroids

Using K-Means++ initialization

Initialization Method

Why K-Means++?

K-Means++ is a smart initialization algorithm that selects initial centroids far apart from each other. This leads to:

  • Faster convergence
  • Better final clustering quality
  • More consistent results

Algorithm: First centroid is chosen randomly. Each subsequent centroid is selected with probability proportional to the square of its distance from the nearest existing centroid.

Elbow Method

Find optimal k by looking for the "elbow" in the inertia curve

Algorithm Parameters

Legend

Centroid (Cluster Center)
Unassigned Point
Currently Assigning

About K-Means & K-Means++

Type: Unsupervised Learning

Goal: Minimize within-cluster variance

Init Methods:

  • K-Means++: Smart initialization (default) - selects centroids far apart
  • Random: Randomly selects k points as initial centroids

Complexity: O(n × k × i) where i = iterations

K-Means++ Advantage: Reduces iterations needed and improves final cluster quality by up to 1000x compared to random initialization.