Centroid is the mean (average) of all data points in a cluster and is used to represent the cluster.
Mathematical Definition
If a cluster has n points:
y1,y2,…,yn
Then the centroid is:
Cx=x1+x2++xnn
Cy=y1+y2++ynn
Example
Cluster C1 contains points:
- P1 (2,15)
- P2 (3,18)
- P3 (4,12)
- P7 (4,16)
- P8 (3,14)
Centroid calculation:
Cx=2+3+4+4+35=3.2
Cy=15+18+12+16+145=15
Centroid = (3.2, 15)
đŸ”¹ Why Centroid is Important in K-Means?
- It represents the cluster center
- Used to assign nearest data points
- Updated repeatedly until it stops changing
- Helps minimize within-cluster distance
