How TF-IDF (Term Frequency – Inverse Document Frequency)Works in NLP (Step-by-Step Example)

TF-IDF gives higher weight to important words and lower weight to common words.

Bag of Words only counts.
TF-IDF counts + judges importance.

TF-IDF has TWO parts

TF — Term Frequency

(How often a word appears in a sentence)

TF tells us how important a word is inside one sentence.

Formula:

TF =

 IDF — Inverse Document Frequency

How rare a word is across all sentences

IDF tells us how special a word is across the whole dataset.

If a word appears in every sentence, it’s not special.
If it appears in only one sentence, it’s very special.

formula:

IDF =

Example:

Suppose 3 sentences after step of preprocessing

S1 → good boy
S2 → good girl
S3 → boy girl good

Step 1: Build Vocabulary

Unique words:

{ good, boy, girl }

These are our features.

Step 2: Term Frequency (TF)

TF =

S1 = “good boy” (2 words)

  • good → 1/2
  • boy → 1/2
  • girl → 0

S2 = “good girl” (2 words)

  • good → 1/2
  • boy → 0
  • girl → 1/2

S3 = “boy girl good” (3 words)

  • good → 1/3
  • boy → 1/3
  • girl → 1/3

TF Table

WordS1S2S3
good1/21/21/3
boy1/201/3
girl01/21/3

 Step 3: Inverse Document Frequency (IDF)

IDF =

Now we check:

In how many sentences does each word appear?

Total sentences = 3

S1 → good boy
S2 → good girl
S3 → boy girl good

Count sentence presence:

  • good → appears in S1, S2, S3 → 3 sentences
  • boy → appears in S1, S3 → 2 sentences
  • girl → appears in S2, S3 → 2 sentences

IDF Formula :

IDF = log( Total sentences / Sentences containing the word )

IDF Values

WordCalculationIDF
goodlog(3/3)0
boylog(3/2)> 0
girllog(3/2)> 0

Important Observation

“good” appears everywhere → IDF = 0 (not special)

“boy” and “girl” appear less → higher IDF (more important)

So TF-IDF automatically says:

“good” is common → reduce its weight
“boy” and “girl” are rarer → increase their weight

Step 4: TF × IDF = TF-IDF

Now we multiply:

TF-IDF = TF × IDF

Final TF-IDF Table

goodboygirl
S10(1/2) × log(3/2)0
S200(1/2) × log(3/2)
S30(1/3) × log(3/2)(1/3) × log(3/2)

Explanation:

For S1 (“good boy”):

  • good → TF × IDF = anything × 0 = 0
  • boy → (1/2) × log(3/2) → positive value
  • girl → 0

So S1 becomes:

[0 , important , 0]

Meaning:

👉 “boy” is the key word.

For S2 (“good girl”):

[0 , 0 , important]

👉 “girl” is the key word.

For S3 (“boy girl good”):

[0 , small , small]

Because TF is smaller (1/3).

Main Idea of Final TF-IDF

TF-IDF removes common words and highlights rare, meaningful words.

“good” disappears

“boy” and “girl” become important