Skip gram

Skip gram

CBOW combines all context words together and predicts the center word.

But sometimes we want the opposite.

We may want to know:

Given a word, which words usually appear around it?

That leads to Skip-Gram.

Skip-Gram Idea

Skip-Gram does the reverse task.

Instead of predicting target from context, it predicts context from target.

So the direction changes.

CBOW      : Context → Target
Skip-Gram : Target  → Context

Definition

Skip-Gram predicts the surrounding context words using the target word.

Structure:

Input Word (One-hot)
        ↓
Embedding Layer
        ↓
Output Layer (Softmax)
        ↓
Predicted Context Word

So the model:

  1. Takes target word as input
  2. Converts it to embedding
  3. Predicts context words

Example Sentence

NLP NAME IS RELATED TO DATA SCIENCE

Suppose target word = IS

Context words:

KRISH, NAME, RELATED, TO


2️ Training Pairs in Skip-Gram

Skip-Gram creates separate training pairs.

Instead of predicting the center word, it predicts each surrounding word.

So the pairs become:

(IS → KRISH)
(IS → NAME)
(IS → RELATED)
(IS → TO)

Each pair is treated as a separate training example.


3️ What the Model Actually Does

Suppose the vocabulary is:

[NLP, NAME, IS, RELATED, TO, DATA, SCIENCE]

Input:

IS

The model produces probabilities for all words in vocabulary.

Suppose, Example output:

WordProbability
KRISH0.15
NAME0.18
IS0.05
RELATED0.22
TO0.20
DATA0.10
SCIENCE0.10

4️ Training Step

For the pair:

(IS → KRISH)

Correct answer = KRISH

The model adjusts weights so that:

P(KRISH | IS) increases


Next training pair:

(IS → NAME)

Now correct answer = NAME

Model adjusts weights again so that:

P(NAME | IS) increases


5️ Important Idea

Skip-Gram does multiple predictions using the same target word.

IS → KRISH
IS → NAME
IS → RELATED
IS → TO

Each one is a separate training example.


CBOW vs Skip-Gram (Simple Comparison)

ModelInputOutput
CBOWContext wordsTarget word
Skip-GramTarget wordContext words