word2vec

Word2vec

Overview

The word2vec algorithm is an algorithm that is used to represent a word as a vector. The vectors that are output by the work2vec algorithm contain meaning beyond just representing a word. In fact, the dot product of two word vectors will return a measure of the meaning similarity between the two words.

There are multiple versions of the word2vec algorithm.

Skip Gram Algorithm

A skip gram is a sequence of words where one word has been omitted. (typically the middle word) The words in the sequence that arent removed are referred to as "context words".

As an example, the skip gram algorithm may choose a sequence size of 5 words. The middle word becomes the skip word and the other words become context words.

Next, the skip word is encoded as a one-hot encoded vector, as in the bag of words model.

This vector becomes the input to a two layer neural network.

The output of the network is a set of probabilities over one-hot encoded words. That is, the output has the same size as the input. It is used to predict the probability of context words. (Note, in this case, the input vector and the output vector are both large vectors, such as 10,000 elements)

The hidden layer will be smaller than the one-hot encoded input and output vectors, typically around a couple hundred elements.

The network is then trained to try to predict the context words for each skipped word. In the example given with 5 word sequences, each skip word is used to try predict each of the four context words. The model is trained to maximize its ability to predict these context words.

Lastly, the output layer of the neural network is discarded. The vector produced from the first layer becomes the output of the word2vec algorithm. This is a vector that is much smaller than the one-hot encoded vectors. Also, two words that produce the same output in this middle layer, would produce the same probabilities in the full network. This means that the two words have the same meaning.

Compression

The word2vec algorithm can be understood as a form of Compression. That is, the one-hot word encoding is a sparse representation of the meaning of each word. The output of the algorithm returns a representation that requires a smaller representation. Two words that return the same result of the word2vec algorithm are interpreted to have the same meaning. That is, the redundancy of the language using two words with the same meaning is eliminated.

Skip Gram Algorithm

Example Walktrough