Have you ever noticed that customers in a mall don’t behave the same?”
Some people earn high salary but spend less,
some earn less but spend more,
and some are in between.
Can we automatically group similar customers without telling the computer any labels?
That’s exactly what clustering does.
And the most popular clustering algorithm is K-Means.
What is Clustering?
Clustering is a technique used in data mining and machine learning to group similar data points together. In clustering, data points within the same group (called a cluster) are more similar to each other than to those in other groups.
It is an unsupervised learning method, meaning the data does not have predefined labels.
The main goal of clustering is to discover hidden patterns or structures in data by organizing it into meaningful groups.
Example,1 of Clustering
Example: Student Study Hours
Suppose we have data about students based on the number of hours they study per day:
| Student | Study Hours |
| A | 1 |
| B | 2 |
| C | 2 |
| D | 6 |
| E | 7 |
| F | 8 |
If we apply clustering, the algorithm may group students as follows:
- Cluster 1 (Low study hours): A, B, C
- Cluster 2 (High study hours): D, E, F
Here:
- Students in the same cluster have similar study habits
- No labels like “low” or “high” were given in advance
- The algorithm identified the groups automatically
This is an example of how clustering helps organize data based on similarity.
Example 2- of Clustering Using Salary and Spending Score
🛍️ Imagine a Mall Manager Problem
“You are the manager of a shopping mall.”
You have customer data:
- Salary
- Spending Score
You want to:
- Give discounts
- Send offers
- Identify premium customers
“Should we treat all customers the same?”
No!
Some earn more, some spend more, some are window shoppers.
So we need to GROUP customers automatically.
This grouping without labels is called → Clustering
And the most famous clustering algorithm is K-Means.
Consider a shopping mall that collects data about customers based on two features:
- Annual Salary (₹ in lakhs)
- Spending Score (a value from 1 to 100 showing how much a customer spends)
| Customer | Annual Salary (₹ Lakhs) | Spending Score |
| C1 | 2 | 20 |
| C2 | 3 | 25 |
| C3 | 4 | 30 |
| C4 | 8 | 80 |
| C5 | 9 | 85 |
| C6 | 10 | 90 |
When clustering is applied to this dataset:
Cluster 1: Low Salary – Low Spending
- Customers: C1, C2, C3
- Characteristics:
- Lower income
- Low spending behavior
Cluster 2: High Salary – High Spending
- Customers: C4, C5, C6
- Characteristics:
- Higher income
- High spending behavior
Explanation
- Customers within the same cluster have similar salary and spending patterns
- No labels (such as “low spender” or “high spender”) were given initially
- The clustering algorithm automatically grouped customers based on similarity
- This helps businesses identify different customer segments
Why This Example Is Important
- Helps in customer segmentation
- Supports targeted marketing
- Improves business strategies
