Skip gram - AI Knowledge Hub

Skip gram

CBOW combines all context words together and predicts the center word.

But sometimes we want the opposite.

We may want to know:

Given a word, which words usually appear around it?

That leads to Skip-Gram.

Skip-Gram Idea

Skip-Gram does the reverse task.

Instead of predicting target from context, it predicts context from target.

So the direction changes.

CBOW : Context → Target
Skip-Gram : Target → Context

Definition

Skip-Gram predicts the surrounding context words using the target word.

Structure:

Input Word (One-hot)
        ↓
Embedding Layer
        ↓
Output Layer (Softmax)
        ↓
Predicted Context Word

So the model:

Takes target word as input
Converts it to embedding
Predicts context words

Example Sentence

NLP NAME IS RELATED TO DATA SCIENCE

Suppose target word = IS

Context words:

KRISH, NAME, RELATED, TO

2️⃣ Training Pairs in Skip-Gram

Skip-Gram creates separate training pairs.

Instead of predicting the center word, it predicts each surrounding word.

So the pairs become:

(IS → KRISH)
(IS → NAME)
(IS → RELATED)
(IS → TO)

Each pair is treated as a separate training example.

3️⃣ What the Model Actually Does

Suppose the vocabulary is:

[NLP, NAME, IS, RELATED, TO, DATA, SCIENCE]

Input:

The model produces probabilities for all words in vocabulary.

Suppose, Example output:

Word	Probability
KRISH	0.15
NAME	0.18
IS	0.05
RELATED	0.22
TO	0.20
DATA	0.10
SCIENCE	0.10

4️⃣ Training Step

For the pair:

(IS → KRISH)

Correct answer = KRISH

The model adjusts weights so that:

P(KRISH | IS) increases

Next training pair:

(IS → NAME)

Now correct answer = NAME

Model adjusts weights again so that:

P(NAME | IS) increases

5️⃣ Important Idea

Skip-Gram does multiple predictions using the same target word.

IS → KRISH
IS → NAME
IS → RELATED
IS → TO

Each one is a separate training example.

CBOW vs Skip-Gram (Simple Comparison)

Model	Input	Output
CBOW	Context words	Target word
Skip-Gram	Target word	Context words