Demo 1: Deep Learning for NLP - Word Embeddings

Christophe Servan

Tools

FastText: enable you to train and use word embeddings.
Installation:
- Download the installation script FT_installation.bash then type:
```
bash FT_installation.bash
```
  WARNING: the tool will be installed by default into the "tools" directory located into your home directory.
- Once the installation is done, type
```
source ~/.bashrc
```
- Test your installation by typing
```
fasttext --help
```
  the result should print the help of the fasttext command.

Data

Use the wikipedia substract data file (download here) to train our word embeddings.

Work to do

Fristly you will train your word embeddings
First task is to load the them
Then find the closest words
Finally, you will process analogy

I. Train word embeddings

The main idea is to train your first word embeddings model

WARNING: Read carefully the documentation available here: https://github.com/facebookresearch/fastText

Exercice 1: Train two models for each dataset, with a vector dimension of 50, 5iteration of training and a window size of 15.

Q1: Train a cbow model, explain your command line (1pt)

Q2: Train a skip-gram model, explain your command line (1pt)

Q3: Give example of the vector file, describe it (1pt)

II. Loading word embeddings

Loading the WE is not difficult in itself. To make computations fast, we store the whole set of embeddings in a numpy array of shape (num words, dimensions).
We also build a dictionary (vocab) for mapping words to row index in this array, and another (rev vocab) for mapping indexes back to word forms.

        def load(filename):
            vocab = {}
            rev_vocab = []
            lines = open(filename).readlines()
            vectors = np.zeros((int(lines[0].split()[0]), int(lines[0].split(" ")[1])))
            for i, line in enumerate(lines):
                tokens = line.strip().split(" ")
                if (i > 0) :
                    vocab[tokens[0]] = i-1
                    rev_vocab.append(tokens[0])
                    vectors[i-1] = [float(value) for value in tokens[1:]]
            return vocab, rev_vocab, vectors

Exercice 1: Loading

Q1: write in Python the necessary script to load the WE (1pt)

Q2: what "vocab", "rev_vocab" and "vectors" stand for? (1pt)

Exercice 2: Compute the cosine similarity for each model and each datasets

Q1: Explain what is the cosine similarity. (1pt)

Q2: How to compute it in python using numpy? and using scipy? (2pt)

Q3: What is the cosine similarity between the vectors representing "dog" and "cat", and what about "dog" and "dentist"? (1pt)

Q4: What is the closest word to "bank": "river" or "trade"? (1pt)

III. Closest words

Now you have your WE model, you will use a tric to compute all the closest words.
This can be done once for all, by computing the dot product between he matrix containing all vectors and the transpose of the target word vector (np.dot(vectors, v.T)).
Then we can use some numpy trick to recover the indices of the n highest scores.

        def closest(vectors, vector, n=10):
            n=n+1
            scores = np.dot(vectors, vector.T)
            indices = np.argpartition(scores, -n)[-n:]
            indices = indices[np.argsort(scores[indices])]
            output = []
            for i in [int(x) for x in indices]:
                output.append((scores[i], i))
            return reversed(output)

Exercice 1: Code the function

Q1: What pre-process to all the vectors do you NEED to do before using the dot product instead of the cosine? (3pt)

Q2: What "argpartition" stand for? (1pt)

Q3: Add Comments to the code. (1pt)

Exercice 2: Analysis

Q1: What are the closest words to "apple"? (1pt)

Q2: What about other words (close to "apple")? (1pt)

Q3: Can you find words which have a strange neighborhood? (1pt)

Q4: Check your answers using scipy.spatial.distance.cosine regarding the scores obtained what can you conclude about the dot product?

IV. Analogy

Word analogies can be exposed by translating a word vector in a direction that corresponds to a linguistic or semantic relationship between two other words.
So if we have w1 and w2 in relation R(w1,w2), we can compute the relation r=vec(w2)-vec(W1), and then apply this relation to the vector of another word vec(w3).
The word closest to vec(w3)+r should also exhibit the same relation.
Therefore, the idea is to use the closest function to find vectors similar to vec(w2)-vec(w1)+vec(w3).

Exercice 1: Code the analogy function

Q1: Can you use the previous code? How far? (1pt)

Exercice 2: Solve the analogies

Q1: "paris" is to "france" what "delhi" is to ... ? (1pt)

Q2: "gates" - "microsoft" + "apple" = ... ? (1pt)

Q3: "king" - "man" + "woman" = ... ? (1pt)

Q4: "slow" - "slower" + "fast" = ... ? (1pt)

Exercice 3: Bonus

Q1: Augment the dimension of the WE to 300 and retrain them (1pt)

Q2: What are the impact on the analogies? (1pt)

The End.