Activation Function (the “final decision step”)

Now, after adding everything up, you still need to decide:

  • If the total is high enough → “Yes, I’ll go to the movie.” 
  • If the total is too low → “No, I’ll stay home.” 

This decision-making step is the activation function.
It acts like a gate that checks: Should I pass the signal forward or not?

Put Together (Simple Story)

Linear transformation = collecting opinions and weighting them.

Activation function = making the final yes/no decision.

Super-Simple Example for Class

Ask students:

“If you have 80 marks in exams, good attendance, and you study daily, will you pass? ”

Each of these is an input.

Some inputs matter more (exam marks > attendance).

You add them up = linear transformation.

Then you ask: “Is the total good enough to pass the cutoff line?”

That’s the activation function.

What is an activation function in ANN (actually)?

After the linear transformation (weighing inputs and adding them up), the ANN needs a mathematical function to decide:

  • Should this neuron be active (fire) or inactive?
  • How strongly should it pass the signal forward?

That deciding function is the activation function.

Common Activation Functions (explained simply)

Step Function

fx={1, if x≥0 0, if x<0

Very simple: If input is above a threshold → output 1 (fire).

If below → output 0 (no fire).

Like a light switch  (on/off).
Problem: Too rigid, not used much today.

Sigmoid Function

Formula:

Smoothly squashes values between 0 and 1.

Small input → close to 0

Large input → close to 1

Looks like an S-curve.
Example: Probability of being a cat vs not a cat.
Limitation: Can be slow and cause “vanishing gradients.”

tanh (Hyperbolic Tangent)

fx=ez-e-zez+e-z

Similar to sigmoid, but outputs between –1 and +1.

Negative inputs → negative outputs.

Positive inputs → positive outputs.
Helps when signals need to be centered around 0.

ReLU (Rectified Linear Unit) 🚀 (most common today)

Formula: f(x)=max(0,z)

If input is positive → pass it forward.

If input is negative → output 0.

Like a filter: only keeps useful positive signals.
Fast, simple, and works great in deep networks.

Step → Teacher says: “Pass/Fail only, no grades.”

Sigmoid → Teacher gives grades between 0–100, but squashed into 0–1.

tanh → Grades can be negative (bad performance) or positive (good performance).

ReLU → Teacher ignores bad scores (negative = 0) and only cares about positive effort.

Threshold Decision

In step function, the threshold is fixed (e.g., 0).

In sigmoid/tanh, the “threshold” is smooth: closer to 1 means strongly positive, closer to 0 (or –1 for tanh) means negative.

In ReLU, there’s no threshold in the usual sense — negative values are cut to 0, positives just go through.

So, in real ANN:

Linear Transformation = add up weighted inputs.

Activation Function = mathematical function that decides the neuron’s output

Example: Predict if a student will pass or fail an exam

Inputs to the network:

Hours studied = 2

 Hours slept = 8

We want the ANN to say Pass (1) or Fail (0).

Step 1: Inputs & Weights

Each input is multiplied by a weight (importance).

Suppose weights are:

Study hours weight = 0.6

Sleep hours weight = 0.4

Weighted inputs:

2 × 0.6 = 1.2

8 × 0.4 = 3.2

Step 2: Linear Transformation (Summation)

Now add them up with a bias (say b = –4).

z=(1.2+3.2)+(−4)=0.4

Step 3: Activation Function

Let’s use Sigmoid:

z=11+e-z

For z = 0.4:

0.4≈0.60

Output = 0.60 (60% chance of passing).

Step 4: Final Decision

If probability ≥ 0.5 → Pass
If probability < 0.5 → Fail 

Here, 0.60 ≥ 0.5 → Pass.

It looks at how much you studied and slept.

It multiplies them by importance (weights).

Adds everything up (linear transformation).

Runs it through a decision formula (activation).

Finally says: Yes, this student will pass.