Pure and Impure node in CART - AI Knowledge Hub

While constructing a decision tree using the CART algorithm, the dataset is split into
smaller subsets at each node.
After every split, the child nodes may contain samples belonging to a single class or multiple
classes.
Based on this, nodes in CART are classified as pure nodes or impure nodes.
The Gini Index is used to measure the purity or impurity of a node.
Gini Index in CART
Gini Index measures the probability of incorrect classification.
Formula:

Gini = 0 → Pure node
Lower Gini → Better split
The split with the minimum Gini Index is selected.
Gini Index (CART)
Given Data (same as earlier)
From the table:
Total samples = 9
Y = 6, N = 3
After splitting on the feature, we get two child nodes:
C1 → 3Y / 3N
C2 → 3Y / 0N

Step 1: Formula for Gini Index

Root node is IMPURE
Step 3: Gini Index for Child Node C1 (3Y / 3N)

Maximum Gini impurity → Highly impure node
Step 4: Gini Index for Child Node C2 (3Y / 0N)

PURE node

Step 5: Interpretation (Pure vs Impure using Gini)

Node	Y	N	Gini Index	Type
Root	6	3	0.44	Impure
C1	3	3	0.5	Highly Impure
C2	3	0	0	Pure

Important Points
 Gini = 0 → Pure node
 Gini = 0.5 → Maximum impurity (binary class)
 CART selects the split with minimum weighted Gini
 Gini is faster to compute than entropy