Illustration of K-Means Clustering Using a Sample Dataset

To understand how the K-Means clustering algorithm works, let us consider a small customer dataset.
Each customer is represented using two attributes: Annual Income and Spending Score.
The objective of K-Means is to group similar customers into clusters such that customers within the same cluster are more similar to each other than to those in other clusters.
In this example, we apply K-Means with K = 2 and calculate the centroid and cluster assignments step by step using Euclidean distance.

Distance Measure Used

Most commonly used:

Euclidean Distance

Why?

Simple
Works well for numerical data

step-by-step solution of the K-Means clustering algorithm (with K = 2) done on a small customer dataset.

It is K-Means Clustering applied to customer data:

Feature 1 (x) → Annual Income (₹L)
Feature 2 (y) → Spending Score

Each customer is a 2-D point (x, y).

The goal of K-Means:

Group similar customers into clusters based on distance.

Given Data (Customers as points)

Customer	Income (x)	Spending (y)
P1	2	15
P2	3	18
P3	4	12
P4	10	80
P5	12	85
P6	11	78
P7	4	16
P8	3	14

Consider, there are two natural groups:

Low income – low spending
High income – high spending

So K = 2.

STEP 1: Choose Number of Clusters

K=2K = 2K=2

👉 Because visually the data has two natural groups:

Low income – low spending
High income – high spending

STEP 2: Initialize Centroids

Two customers are randomly picked as initial centroids:

Centroid C1 = (2, 15) → from P1
Centroid C2 = (10, 80) → from P4

These are written and circled in the images.

STEP 3: Distance Formula Used

For every point, distance to each centroid is calculated using Euclidean distance:

d=(x-cx)2+(y-cy)2

Where:

x,y→ customer data

cx,cy→ centroid values

Point P1 = (2, 15)

Distance to C1 = (2,15)

(2-2)2+(15-15)2=0=0

Distance to C2 = (10,80)

(2-10)2+(15-80)2=64+4225=4289≈65.52

👉 Assigned to Cluster C1

▶️ Point P2 = (3,18)

Distance to C1 = (2,15)

(3-2)2+(18-15)2=1+9=10=3.16

Distance to C2 = (10,80)

(3-10)2+(18-80)2=49+3844=3893=62.39

👉 P2 → Cluster C1

Update C1 using P1 and P2

Centroid is the mean of points:

C1x=2+32=2.5

C1y=15+182=16.5

📌 New C1 = (2.5, 16.5)
C2 remains (10,80)

Point P3 = (4,12)

Distance to updated C1 = (2.5,16.5)

(4-2.5)2+(12-16.5)2=2.25+20.25=22.5=4.743

Distance to C2 = (10,80)

36+4624=4660=68.26

👉 P3 → Cluster C1

🔹 Update C1 using P1, P2, P3

C1x=2+3+43=3

C1y=15+18+123=15

📌 C1 = (3,15)

▶️ Point P4 = (10,80)

Distance to C1

(10-3)2+(80-15)2=49+4225=65.38

Distance to C2

0=0

👉 P4 → Cluster C2

Point P5 = (12,85)

Distance to C1

81+4900=70.61

Distance to C2

(12-10)2+(85-80)2=4+25=29=5.38

👉 P5 → Cluster C2

🔹 Update C2 using P4 and P5

C2x=10+122=11

C2y=80+852=82.5

📌 C2 = (11,82.5)

▶️ Point P6 = (11,78)

Distance to C1 = (3,15)

64+3969=63.5

Distance to C2 = (11,82.5)

0+20.25=4.5

👉 P6 → Cluster C2

Update C2 using P4, P5, P6

C2x=10+12+113=11

C2y=80+85+783=81

📌 C2 = (11,81)

▶️ Point P7 = (4,16)

Distance to C1 = (3,15)

(1)2+(1)2=2=1.414

Distance to C2

49+4225=65.38

👉 P7 → Cluster C1

🔹 Update C1 using P1, P2, P3, P7

C1x=2+3+4+44=3.25

C1y=15+18+12+164=15.25

📌 C1 = (3.25,15.25)

Point P8 = (3,14)

Distance to C1

0.0625+1.5625=1.625=1.275

Distance to C2

64+4489=67.47

👉 P8 → Cluster C1

🔹 Final Update of C1

C1x=2+3+4+4+35=3.2

C1y=15+18+12+16+145=15

📌 Final C1 = (3.2,15)
📌 Final C2 = (11,81)

✅ FINAL ANSWER

Cluster C1 (Low income – Low spending)

Centroid = (3.2, 15)
Points = P1, P2, P3, P7, P8

Cluster C2 (High income – High spending)

Centroid = (11, 81)
Points = P4, P5, P6