What does Word2Vec actually do?
Word2Vec is an NLP technique that learns numerical vector representations of words based on their surrounding words, so that similar words have similar vectors.
That is
If two words appear in similar sentences, Word2Vec makes their vectors similar.
Example:
- happy appears near excited
- pizza appears near burger
So Word2Vec learns:
- happy ≈ excited
- pizza ≈ burger
Word Embeddings as Feature Representation
Suppose we consider such values, for each feature
| Feature / Word | Boy | Girl | King | Queen | Apple | Mango |
| Gender | -1 | 1 | -0.95 | 0.93 | 0.01 | 0.05 |
| Royal | 0.01 | 0.02 | 0.95 | 0.96 | -0.02 | 0.02 |
| Age | 0.03 | 0.02 | 0.75 | 0.68 | 0.95 | 0.96 |
| Food | ~0 | ~0 | ~0 | ~0 | High | High |
| … | … | … | … | … | … | … |
| Dimension n | … | … | … | … | … | … |
Let us take our vocabulary:
Boy, Girl, King, Queen, Apple, Mango
In a hypothetical world (just for understanding), we can imagine that Word2Vec learns features like:
- Gender
- Royal
- Age
- Food
- … many more (actually ~300 dimensions in real models)
Each word gets a score for every feature.
What does our table show?
Each column is a word.
Each row is a hidden feature.
For example:
🔹 Gender feature
- Boy → −1 (male)
- Girl → +1 (female)
- King → −0.95 (male)
- Queen → +0.93 (female)
- Apple, Mango → near 0 (no gender)
So Word2Vec learns:
👉 Boy ≈ King (both male)
👉 Girl ≈ Queen (both female)
🔹 Royal feature
- King, Queen → very high values (~0.95)
- Boy, Girl → near zero
- Fruits → near zero
So the model understands:
👉 King and Queen are related by royalty.
🔹 Age feature
- King, Queen → higher (adult)
- Boy, Girl → lower (young)
So:
👉 King ≠ Boy (main difference is age).
🔹 Food feature
- Apple, Mango → high
- Humans → near zero
So:
👉 Apple ≈ Mango (both fruits).
What do we observe?
From these numbers, Word2Vec automatically learns:
- King, Queen are similar (royalty)
- Boy, Girl are similar (human + young)
- Apple, Mango are similar (food)
- King and Prince would be similar except for age (if Prince existed)
Meaning Arithmetic
Because meaning is stored as numbers, we can do math:
- King = Royal + Male + Adult
- Queen = Royal + Female + Adult
So:
King − Male + Female ≈ Queen
This is why Word2Vec can perform:
king − man + woman ≈ queen
It works because each vector contains semantic features.
So, Word2Vec does not just count words.
It learns properties of words (gender, category, relationships) from data.
Similar words get similar vectors.
In reality:
- These are NOT just 3–4 features.
- Each word usually has 100–300 hidden dimensions.
We only show a few for understanding.
What is Cosine Similarity (in Word2Vec)?
In Word2Vec, every word is a vector (arrow) in space.

Suppose as shown in diagram, we have 2 vectors for 2 words,
Example:
- “king” → one vector
- “man” → another vector
Now the big question is:
👉 How do we know whether two words are similar?
We look at the ANGLE between the vectors.
That measurement is called:
✅ Cosine Similarity
Cosine similarity measures how close two word vectors are by checking the angle and distance between them.
Small angle → similar words
Large angle → different words
Small distance → similar words
Large distance → different words
Mathematically:
Cosine Similarity = cos(θ)
Distance = 1 − Cosine Similarity
Where θ = angle between the two vectors.
Case 1:
Imagine:
- One vector = king
- Second vector = man
They point almost in the same direction.
Let’s say the angle between them is:
👉 45°
So, cos(45°) = 1/√2 ≈ 0.707
Cosine Similarity = 0.707 (high)
Now calculate distance:
Distance = 1 − 0.707 = 0.29
What does this mean?
- Similarity ≈ 0.7 (quite high)
- Distance ≈ 0.29 (small)
So we say:
👉 “king” and “man” are similar words.
Case 2: Not similar words (90° apart)
Now imagine two vectors:
- One points right
- One points up
They form 90° (right angle).
cos(90°) = 0
So:
Cosine Similarity = 0
Now distance:
Distance = 1 − 0 = 1
What does this mean?
- Similarity = 0
- Distance = 1 (maximum)
So we say:
👉 These two words are NOT related at all.
Example:
- “king” and “mango”
- “happy” and “table”
| Angle | Cosine Similarity | Distance | Meaning |
| 0° | 1 | 0 | Same / very similar |
| 45° | ~0.7 | ~0.3 | Similar |
| 90° | 0 | 1 | Not related |
| 180° | −1 | 2 | Opposite meaning |
Why Cosine Similarity is IMPORTANT for Word2Vec
Word2Vec converts words into vectors.
But vectors alone are useless unless we can:
✅ compare words
✅ find closest words
✅ detect similarity
Cosine similarity does exactly this.
It helps to:
- Find similar words
- Do word arithmetic
- Recommend next words
- Power chatbots and search engines
Summary
- Word2Vec → words become vectors
- Cosine similarity → measures angle
- Small angle → similar words
- 90° → unrelated words
That’s how Word2Vec mathematically understands meaning.
How does Word2Vec get trained?
There are two main ways to train a Word2Vec model:
- CBOW (Continuous Bag of Words)
- Skip-Gram
