Skip gram
CBOW combines all context words together and predicts the center word.
But sometimes we want the opposite.
We may want to know:
Given a word, which words usually appear around it?
That leads to Skip-Gram.
Skip-Gram Idea
Skip-Gram does the reverse task.
Instead of predicting target from context, it predicts context from target.
So the direction changes.
CBOW : Context → Target
Skip-Gram : Target → Context
Definition
Skip-Gram predicts the surrounding context words using the target word.
Structure:
Input Word (One-hot)
↓
Embedding Layer
↓
Output Layer (Softmax)
↓
Predicted Context Word
So the model:
- Takes target word as input
- Converts it to embedding
- Predicts context words
Example Sentence
NLP NAME IS RELATED TO DATA SCIENCE
Suppose target word = IS
Context words:
KRISH, NAME, RELATED, TO
2️⃣ Training Pairs in Skip-Gram
Skip-Gram creates separate training pairs.
Instead of predicting the center word, it predicts each surrounding word.
So the pairs become:
(IS → KRISH)
(IS → NAME)
(IS → RELATED)
(IS → TO)
Each pair is treated as a separate training example.
3️⃣ What the Model Actually Does
Suppose the vocabulary is:
[NLP, NAME, IS, RELATED, TO, DATA, SCIENCE]
Input:
IS
The model produces probabilities for all words in vocabulary.
Suppose, Example output:
| Word | Probability |
| KRISH | 0.15 |
| NAME | 0.18 |
| IS | 0.05 |
| RELATED | 0.22 |
| TO | 0.20 |
| DATA | 0.10 |
| SCIENCE | 0.10 |
4️⃣ Training Step
For the pair:
(IS → KRISH)
Correct answer = KRISH
The model adjusts weights so that:
P(KRISH | IS) increases
Next training pair:
(IS → NAME)
Now correct answer = NAME
Model adjusts weights again so that:
P(NAME | IS) increases
5️⃣ Important Idea
Skip-Gram does multiple predictions using the same target word.
IS → KRISH
IS → NAME
IS → RELATED
IS → TO
Each one is a separate training example.
CBOW vs Skip-Gram (Simple Comparison)
| Model | Input | Output |
| CBOW | Context words | Target word |
| Skip-Gram | Target word | Context words |
