How One-Hot Encoding Works in NLP (Step-by-Step Example)One-Hot Encoding

What is One-Hot Encoding?

One-Hot Encoding is a technique that converts words into binary vectors (0s and 1s), where only one position is 1 and all others are 0.

In simple words:

Each word is represented by a vector with one “1” and the rest “0”.

Let’s Use a Small Example

Sentences:

S1: The food is good
S2: The food is bad
S3: Pizza is amazing

Step 1: Build Vocabulary (Unique Words)

Collect all unique words:

[the, food, is, good, bad, pizza, amazing]

Vocabulary size = 7

These are all the different words our computer knows.

Step 2: Assign One Position to Each Word

Word	Position
the	1
food	2
is	3
good	4
bad	5
pizza	6
amazing	7

So every word becomes a vector of length 7.

Step 3: Convert Each Word to One-Hot Vector

Word → Vector

the → [1 0 0 0 0 0 0]

food → [0 1 0 0 0 0 0]

is → [0 0 1 0 0 0 0]

good → [0 0 0 1 0 0 0]

bad → [0 0 0 0 1 0 0]

pizza → [0 0 0 0 0 1 0]

amazing → [0 0 0 0 0 0 1]

👉 Only one position is 1, rest are 0
That’s why it’s called One-Hot.

Step 4: Encoding a Full Sentence

Now Encode a Sentence

Sentence S1:

The food is good

This sentence has 4 words, so it will be represented as 4 vectors.

the → [1 0 0 0 0 0 0]

food → [0 1 0 0 0 0 0]

is → [0 0 1 0 0 0 0]

good → [0 0 0 1 0 0 0]

So Sentence S1 becomes a matrix:

S1 One-Hot Matrix:

S1 = [

[1 0 0 0 0 0 0],

[0 1 0 0 0 0 0],

[0 0 1 0 0 0 0],

[0 0 0 1 0 0 0]

]

A sentence is represented as multiple one-hot vectors, one for each word.

Each word → vector of size 7.

So S1 becomes a 4 × 7 matrix.

Size = 4 × 7

(4 words × 7 vocabulary)

Sentence S2

S2 = “The food is bad”

Also 4 words, so again:

Size = 4 × 7

S2 One-Hot Matrix:

[

[1 0 0 0 0 0 0], ← the

[0 1 0 0 0 0 0], ← food

[0 0 1 0 0 0 0], ← is

[0 0 0 0 1 0 0] ← bad

]

S2 also has 4 words → 4×7 matrix.

Only last row changes (good vs bad).

Sentence S3

S3 = “Pizza is amazing”

This sentence has only 3 words:

pizza
is
amazing

So S3 becomes a 3 × 7 matrix.

S3 One-Hot Matrix:

[

[0 0 0 0 0 1 0], ← pizza

[0 0 1 0 0 0 0], ← is

[0 0 0 0 0 0 1] ← amazing

]

📌 Size = 3 × 7

(3 words × 7 vocabulary)

Fewer words → fewer rows.

Note:

Columns depend on vocabulary

👉 Rows depend on sentence length

So:

Vocabulary = 7 → always 7 columns
S1 has 4 words → 4 rows
S3 has 3 words → 3 rows

Sentence Matrix Size = Number of Words × Vocabulary Size

Another Important Point

Are “good” and “amazing” related?

Human: YES
One-Hot: NO

Because:

good → [0 0 0 1 0 0 0]

amazing → [0 0 0 0 0 0 1]

No similarity.

So:

👉 One-Hot Encoding does NOT capture meaning.

That’s why we use Word Embeddings later.