Advantages of TF-IDF
Simple and Intuitive
TF-IDF is easy to understand:
- TF → counts words in a sentence
- IDF → checks how rare the word is
So ,
Important words get higher values.
Fixed-Size Input (Good for ML Algorithms)
Once vocabulary is created, every sentence becomes a vector of size:
👉 Vocabulary size
So:
- S1 → vector length = vocab size
- S2 → same length
- S3 → same length
Machine Learning algorithms need fixed-size input, and TF-IDF provides that.
Word Importance Is Captured (Big Improvement over BoW)
Unlike Bag of Words:
- TF-IDF reduces weight of common words like good
- TF-IDF increases weight of rarer words like boy, girl
TF-IDF automatically highlights meaningful words.
This is a major advantage.
Disadvantages of TF-IDF
Even though TF-IDF is better than BoW, it still has problems.
1️⃣ Sparsity Exists
Most values are still zero.
So:
👉 Sparse matrix
👉 High memory usage
👉 Possibility of overfitting
Same issue as BoW and One-Hot.
2️⃣ Out-of-Vocabulary (OOV) Problem
If a new word appears that was not in training vocabulary:
👉 TF-IDF cannot represent it.
So:
New or unseen words are ignored.
So, we will move to
Word2Vec
