Summary architecture CBOW - AI Knowledge Hub

Architecture

CBOW consists of:

Two weight matrices:

Where:

Weight Matrix W₁ (Input → Hidden)

Each row represents embedding of one word.

Embedding Lookup

For a one-hot input word

This simply selects:

vi=[wi1,wi2,…,wiN]

For k context words:

Context Aggregation

CBOW averages all embeddings:

This produces a single context vector C.

Weight Matrix W₂ (Hidden → Output)

Output Computation

Context vector multiplied with W₂:

This gives:

Z=[z1,z2,…,zV]

Raw scores for all vocabulary words.

Softmax

(Softmax:

)

Softmax converts scores into probabilities:

Denominator → normalization

The word with highest probability is predicted.

Loss Calculation

Cross-entropy loss: L=−log(P(ytrue))

Backpropagation

Error is propagated backward to update:

Repeat

Process is repeated for: