Decision Tree using ID3(Information Gain)(Example)


Suppose, we need to identify, from the given image

whether an image is a “Winter Family Photo” or “Not a Winter Family Photo”.


Step 1: Identify the Goal (Target Variable)

Target / Output class:

  • Winter Family Photo
  • Not a Winter Family Photo


This is a binary classification problem.


Step 2: Identify the Features (Attributes)

From the given images, we consider the following features:

  1. Cartoon Character? (Yes / No)
  2. Winter? (Yes / No) → snow, jackets, winter background
  3. More than 1 person? (Yes / No)

These are independent features used to make decisions.

Step 3: Build the Decision Tree (Top-Down Approach)

Root Node Selection using Information GainIn decision trees (ID3 algorithm),
👉 Feature with the highest Information Gain is chosen as the root node.

1️⃣ First Split: Cartoon Character?

  • Cartoons are not real family photos
  • This split removes many incorrect images immediately
  • Highest Information Gain

So, the root node is:

Cartoon Character?

If YES → Not a winter family photo
If NO → Continue splitting

2️⃣ Second Split: Winter?
For non-cartoon images:
Winter?
If NO → Not a winter family photo (summer images)

If YES → Continue splitting

3️⃣ Third Split: More than 1 person?
Because a family photo must contain more than one person:
More than 1 person?
 YES → ✔ Winter family photo
 NO → ✖ Not a winter family photo

Step 4: Role of Information Gain (Concept Explanation)
What is Information Gain?
Information Gain = Reduction in Entropy

  • Measures how well a feature separates the data
  • Higher Information Gain → Better feature for splitting
    Why “Cartoon Character?” is Chosen First?
     Splitting by cartoon images immediately classifies many images
     Reduces uncertainty the most
     Hence, it has the highest Information Gain

Other features like Winter and Number of persons give less gain initially, so they appear
lower in the tree.

What the image shows


The image compares three possible features to decide which one should be the root node of a
decision tree using Information Gain.

  • Total images (dataset) = 8 pictures
  • Goal: Select the best feature that separates the data most clearly
  • Information Gain = Base Entropy − New Entropy

Step 1: Initial Situation (Base Entropy)
At the start:
 We have 8 mixed pictures
 Some are winter family photos
 Some are not
Because the data is mixed, entropy is high (high uncertainty).

Step 2: Try Different Splits (Three Features)
The image shows three candidate features:
1️⃣ Cartoon Character?
2️⃣ Winter?

3️⃣ More than 1 person?
Each feature is tested to see how much it reduces uncertainty.

Case 1: Split using Cartoon Character?
 After splitting:
o Cartoon images are immediately classified as Not a winter family photo
o Remaining images = 4
 Uncertainty is reduced a lot
Result shown in image:
 We had 8 pictures
 Now just 4 left
 Information Gain = 4 (Highest)
This means Cartoon Character? separates the data best.

Case 2: Split using Winter?
 After splitting:
o Some images are still mixed
o Remaining images = 5
 More uncertainty remains compared to Case 1
Result shown in image:
 We had 8 pictures
 Now 5 left
 Information Gain = 3
Good split, but not the best.

Case 3: Split using More than 1 person?
 After splitting:
o Still mixed images in both branches
o Remaining images = 5

 Less effective in reducing uncertainty
Result shown in image:
 We had 8 pictures
 Now 5 left
 Information Gain = 3
Same gain as Winter?, but still lower than Cartoon Character?.

Step 3: Selecting the Root Node
Rule:
Feature with the highest Information Gain becomes the root node
From the image:
 Cartoon Character? → IG = 4
 Winter? → IG = 3
 More than 1 person? → IG = 3
Final Decision:
Cartoon Character? is selected as the root node

What This Image Teaches
 Information Gain measures how much uncertainty is reduced
 Bigger reduction = better feature
 Decision trees always choose the feature with maximum Information Gain
 This is how the ID3 algorithm works