What is One-Hot Encoding?
One-Hot Encoding is a technique that converts words into binary vectors (0s and 1s), where only one position is 1 and all others are 0.
In simple words:
Each word is represented by a vector with one “1” and the rest “0”.
Let’s Use a Small Example
Sentences:
S1: The food is good
S2: The food is bad
S3: Pizza is amazing
Step 1: Build Vocabulary (Unique Words)
Collect all unique words:
[the, food, is, good, bad, pizza, amazing]
Vocabulary size = 7
These are all the different words our computer knows.
Step 2: Assign One Position to Each Word
| Word | Position |
| the | 1 |
| food | 2 |
| is | 3 |
| good | 4 |
| bad | 5 |
| pizza | 6 |
| amazing | 7 |
So every word becomes a vector of length 7.
Step 3: Convert Each Word to One-Hot Vector
Word → Vector
the → [1 0 0 0 0 0 0]
food → [0 1 0 0 0 0 0]
is → [0 0 1 0 0 0 0]
good → [0 0 0 1 0 0 0]
bad → [0 0 0 0 1 0 0]
pizza → [0 0 0 0 0 1 0]
amazing → [0 0 0 0 0 0 1]
👉 Only one position is 1, rest are 0
That’s why it’s called One-Hot.
Step 4: Encoding a Full Sentence
Now Encode a Sentence
Sentence S1:
The food is good
This sentence has 4 words, so it will be represented as 4 vectors.
the → [1 0 0 0 0 0 0]
food → [0 1 0 0 0 0 0]
is → [0 0 1 0 0 0 0]
good → [0 0 0 1 0 0 0]
So Sentence S1 becomes a matrix:
S1 One-Hot Matrix:
S1 = [
[1 0 0 0 0 0 0],
[0 1 0 0 0 0 0],
[0 0 1 0 0 0 0],
[0 0 0 1 0 0 0]
]
A sentence is represented as multiple one-hot vectors, one for each word.
Each word → vector of size 7.
So S1 becomes a 4 × 7 matrix.
Size = 4 × 7
(4 words × 7 vocabulary)
Sentence S2
S2 = “The food is bad”
Also 4 words, so again:
Size = 4 × 7
S2 One-Hot Matrix:
[
[1 0 0 0 0 0 0], ← the
[0 1 0 0 0 0 0], ← food
[0 0 1 0 0 0 0], ← is
[0 0 0 0 1 0 0] ← bad
]
S2 also has 4 words → 4×7 matrix.
Only last row changes (good vs bad).
Sentence S3
S3 = “Pizza is amazing”
This sentence has only 3 words:
- pizza
- is
- amazing
So S3 becomes a 3 × 7 matrix.
S3 One-Hot Matrix:
[
[0 0 0 0 0 1 0], ← pizza
[0 0 1 0 0 0 0], ← is
[0 0 0 0 0 0 1] ← amazing
]
📌 Size = 3 × 7
(3 words × 7 vocabulary)
Fewer words → fewer rows.
Note:
Columns depend on vocabulary
👉 Rows depend on sentence length
So:
- Vocabulary = 7 → always 7 columns
- S1 has 4 words → 4 rows
- S3 has 3 words → 3 rows
Sentence Matrix Size = Number of Words × Vocabulary Size
Another Important Point
Are “good” and “amazing” related?
Human: YES
One-Hot: NO
Because:
good → [0 0 0 1 0 0 0]
amazing → [0 0 0 0 0 0 1]
No similarity.
So:
👉 One-Hot Encoding does NOT capture meaning.
That’s why we use Word Embeddings later.
