Introduction to Cluster - AI Knowledge Hub

Have you ever noticed that customers in a mall don’t behave the same?”
Some people earn high salary but spend less,
some earn less but spend more,
and some are in between.

Can we automatically group similar customers without telling the computer any labels?

That’s exactly what clustering does.
And the most popular clustering algorithm is K-Means.

What is Clustering?

Clustering is a technique used in data mining and machine learning to group similar data points together. In clustering, data points within the same group (called a cluster) are more similar to each other than to those in other groups.
It is an unsupervised learning method, meaning the data does not have predefined labels.

The main goal of clustering is to discover hidden patterns or structures in data by organizing it into meaningful groups.

Example,1 of Clustering

Example: Student Study Hours

Suppose we have data about students based on the number of hours they study per day:

Student	Study Hours
A	1
B	2
C	2
D	6
E	7
F	8

If we apply clustering, the algorithm may group students as follows:

Cluster 1 (Low study hours): A, B, C
Cluster 2 (High study hours): D, E, F

Here:

Students in the same cluster have similar study habits
No labels like “low” or “high” were given in advance
The algorithm identified the groups automatically

This is an example of how clustering helps organize data based on similarity.

Example 2- of Clustering Using Salary and Spending Score

🛍️ Imagine a Mall Manager Problem

“You are the manager of a shopping mall.”

You have customer data:

Salary
Spending Score

You want to:

Give discounts
Send offers
Identify premium customers

“Should we treat all customers the same?”

No!
Some earn more, some spend more, some are window shoppers.

So we need to GROUP customers automatically.

This grouping without labels is called → Clustering

And the most famous clustering algorithm is K-Means.

Consider a shopping mall that collects data about customers based on two features:

Annual Salary (₹ in lakhs)
Spending Score (a value from 1 to 100 showing how much a customer spends)

Customer	Annual Salary (₹ Lakhs)	Spending Score
C1	2	20
C2	3	25
C3	4	30
C4	8	80
C5	9	85
C6	10	90

When clustering is applied to this dataset:

Cluster 1: Low Salary – Low Spending

Customers: C1, C2, C3
Characteristics:
- Lower income
- Low spending behavior

Cluster 2: High Salary – High Spending

Customers: C4, C5, C6
Characteristics:
- Higher income
- High spending behavior

Explanation

Customers within the same cluster have similar salary and spending patterns
No labels (such as “low spender” or “high spender”) were given initially
The clustering algorithm automatically grouped customers based on similarity
This helps businesses identify different customer segments

Why This Example Is Important

Helps in customer segmentation
Supports targeted marketing
Improves business strategies