To understand how the K-Means clustering algorithm works, let us consider a small customer dataset.
Each customer is represented using two attributes: Annual Income and Spending Score.
The objective of K-Means is to group similar customers into clusters such that customers within the same cluster are more similar to each other than to those in other clusters.
In this example, we apply K-Means with K = 2 and calculate the centroid and cluster assignments step by step using Euclidean distance.
Distance Measure Used
Most commonly used:
- Euclidean Distance
Why?
- Simple
- Works well for numerical data
step-by-step solution of the K-Means clustering algorithm (with K = 2) done on a small customer dataset.
It is K-Means Clustering applied to customer data:
- Feature 1 (x) → Annual Income (₹L)
- Feature 2 (y) → Spending Score
Each customer is a 2-D point (x, y).
The goal of K-Means:
Group similar customers into clusters based on distance.
Given Data (Customers as points)
| Customer | Income (x) | Spending (y) |
| P1 | 2 | 15 |
| P2 | 3 | 18 |
| P3 | 4 | 12 |
| P4 | 10 | 80 |
| P5 | 12 | 85 |
| P6 | 11 | 78 |
| P7 | 4 | 16 |
| P8 | 3 | 14 |
Consider, there are two natural groups:
- Low income – low spending
- High income – high spending
So K = 2.
STEP 1: Choose Number of Clusters
K=2K = 2K=2
👉 Because visually the data has two natural groups:
- Low income – low spending
- High income – high spending
STEP 2: Initialize Centroids
Two customers are randomly picked as initial centroids:
- Centroid C1 = (2, 15) → from P1
- Centroid C2 = (10, 80) → from P4
These are written and circled in the images.
STEP 3: Distance Formula Used
For every point, distance to each centroid is calculated using Euclidean distance:
d=(x-cx)2+(y-cy)2
Where:
- x,y→ customer data
cx,cy→ centroid values
Point P1 = (2, 15)
Distance to C1 = (2,15)
(2-2)2+(15-15)2=0=0
Distance to C2 = (10,80)
(2-10)2+(15-80)2=64+4225=4289≈65.52
👉 Assigned to Cluster C1
▶️ Point P2 = (3,18)
Distance to C1 = (2,15)
(3-2)2+(18-15)2=1+9=10=3.16
Distance to C2 = (10,80)
(3-10)2+(18-80)2=49+3844=3893=62.39
👉 P2 → Cluster C1
Update C1 using P1 and P2
Centroid is the mean of points:
C1x=2+32=2.5
C1y=15+182=16.5
📌 New C1 = (2.5, 16.5)
C2 remains (10,80)
Point P3 = (4,12)
Distance to updated C1 = (2.5,16.5)
(4-2.5)2+(12-16.5)2=2.25+20.25=22.5=4.743
Distance to C2 = (10,80)
36+4624=4660=68.26
👉 P3 → Cluster C1
🔹 Update C1 using P1, P2, P3
C1x=2+3+43=3
C1y=15+18+123=15
📌 C1 = (3,15)
▶️ Point P4 = (10,80)
Distance to C1
(10-3)2+(80-15)2=49+4225=65.38
Distance to C2
0=0
👉 P4 → Cluster C2
Point P5 = (12,85)
Distance to C1
81+4900=70.61
Distance to C2
(12-10)2+(85-80)2=4+25=29=5.38
👉 P5 → Cluster C2
🔹 Update C2 using P4 and P5
C2x=10+122=11
C2y=80+852=82.5
📌 C2 = (11,82.5)
▶️ Point P6 = (11,78)
Distance to C1 = (3,15)
64+3969=63.5
Distance to C2 = (11,82.5)
0+20.25=4.5
👉 P6 → Cluster C2
Update C2 using P4, P5, P6
C2x=10+12+113=11
C2y=80+85+783=81
📌 C2 = (11,81)
▶️ Point P7 = (4,16)
Distance to C1 = (3,15)
(1)2+(1)2=2=1.414
Distance to C2
49+4225=65.38
👉 P7 → Cluster C1
🔹 Update C1 using P1, P2, P3, P7
C1x=2+3+4+44=3.25
C1y=15+18+12+164=15.25
📌 C1 = (3.25,15.25)
Point P8 = (3,14)
Distance to C1
0.0625+1.5625=1.625=1.275
Distance to C2
64+4489=67.47
👉 P8 → Cluster C1
🔹 Final Update of C1
C1x=2+3+4+4+35=3.2
C1y=15+18+12+16+145=15
📌 Final C1 = (3.2,15)
📌 Final C2 = (11,81)
✅ FINAL ANSWER
Cluster C1 (Low income – Low spending)
Centroid = (3.2, 15)
Points = P1, P2, P3, P7, P8
Cluster C2 (High income – High spending)
Centroid = (11, 81)
Points = P4, P5, P6
