Pure and Impure Split using Entropy (with Example)

Pure and Impure Split using Entropy (with
Example)


Imagine separating apples and oranges.
If one basket has only apples, the separation is perfect.
If another basket has apples and oranges mixed, the separation is confusing.


Decision trees face the same situation.
Entropy helps the tree decide whether a split is clean (pure) or messy (impure).


Based on entropy values, splits are classified as pure or impure.
Definition of Pure and Impure Split


 Pure Split
→ All samples belong to the same class
→ Entropy = 0


 Impure Split
→ Samples belong to more than one class
→ Entropy > 0
Suppose, given data is
Feature

FeatureO/P (Class)
C1Y
C2Y
C1Y
C2Y
C1Y
C1N
C2Y
C1N
C1N

Overall Class Distribution

Feature-wise Split (as needed for entropy)
Feature

  • Total samples = 9
  • Yes (Y) = 6
  • No (N) = 3


Step 1: Entropy of Root Node (Before Split)


Formula of Entropy

Since entropy > 0, the node is IMPURE.


Step 2: Split on Feature → C1 and C2


Split 1: Node C1
From your notes:

  • Y = 3
  • N = 3
  • Total = 6

Maximum entropy → Completely IMPURE split


Split 2: Node C2
From your notes:

  • Y = 3
  • N = 0
  • Total = 3

Entropy = 0 → PURE node
Step 3: Interpretation (Pure vs Impure)


Pure Split

  • All samples belong to one class only
  • Entropy = 0
  • No further splitting needed
  • In this example:
  • C2 is a PURE split

Impure Split

  • Samples belong to multiple classes
  • Entropy > 0
  • Further splitting required
  • In this example:
    • Root node (0.918) → Impure
    • C1 node (1) → Highly impure

Step 4: Summary Table

NodeYNEntropyType
Root630.918Impure
C1331.0Impure
C2300Pure

Thus, ID3 constructs a decision tree by selecting attributes that provide maximum
information gain, resulting in an efficient and interpretable classification model.


Key Points

  • Pure node → Entropy = 0
  • Impure node → Entropy > 0
  • Decision trees try to create pure child nodes
  • ID3 uses entropy and information gain
  • Best split is the one that reduces impurity the most