Advantages and Disadvantages of TF-IDF (Term Frequency – Inverse Document Frequency)

Advantages of TF-IDF

Simple and Intuitive

TF-IDF is easy to understand:

So ,

Important words get higher values.

Fixed-Size Input (Good for ML Algorithms)

Once vocabulary is created, every sentence becomes a vector of size:

👉 Vocabulary size

So:

Machine Learning algorithms need fixed-size input, and TF-IDF provides that.

Word Importance Is Captured (Big Improvement over BoW)

Unlike Bag of Words:

TF-IDF automatically highlights meaningful words.

This is a major advantage.

Disadvantages of TF-IDF

Even though TF-IDF is better than BoW, it still has problems.

1️⃣ Sparsity Exists

Most values are still zero.

So:

👉 Sparse matrix
👉 High memory usage
👉 Possibility of overfitting

Same issue as BoW and One-Hot.

2️⃣ Out-of-Vocabulary (OOV) Problem

If a new word appears that was not in training vocabulary:

👉 TF-IDF cannot represent it.

So:

New or unseen words are ignored.

So, we will move to

Word2Vec