Demo 1: Deep Learning for NLP - Word Embeddings

Christophe Servan


Tools

Data


Work to do

      This TP is here to help you manipulate the word embeddings (WE)

    I. Train word embeddings

      The main idea is to train your first word embeddings model

      WARNING: Read carefully the documentation available here: https://github.com/facebookresearch/fastText

        Exercice 1: Train two models for each dataset, with a vector dimension of 50, 5iteration of training and a window size of 15.

          Q1: Train a cbow model, explain your command line (1pt)

          Q2: Train a skip-gram model, explain your command line (1pt)

          Q3: Give example of the vector file, describe it (1pt)

    II. Loading word embeddings

      Loading the WE is not difficult in itself. To make computations fast, we store the whole set of embeddings in a numpy array of shape (num words, dimensions).
      We also build a dictionary (vocab) for mapping words to row index in this array, and another (rev vocab) for mapping indexes back to word forms.

              def load(filename):
                  vocab = {}
                  rev_vocab = []
                  lines = open(filename).readlines()
                  vectors = np.zeros((int(lines[0].split()[0]), int(lines[0].split(" ")[1])))
                  for i, line in enumerate(lines):
                      tokens = line.strip().split(" ")
                      if (i > 0) :
                          vocab[tokens[0]] = i-1
                          rev_vocab.append(tokens[0])
                          vectors[i-1] = [float(value) for value in tokens[1:]]
                  return vocab, rev_vocab, vectors
      
                  

        Exercice 1: Loading

          Q1: write in Python the necessary script to load the WE (1pt)

          Q2: what "vocab", "rev_vocab" and "vectors" stand for? (1pt)

        Exercice 2: Compute the cosine similarity for each model and each datasets

          Q1: Explain what is the cosine similarity. (1pt)

          Q2: How to compute it in python using numpy? and using scipy? (2pt)

          Q3: What is the cosine similarity between the vectors representing "dog" and "cat", and what about "dog" and "dentist"? (1pt)

          Q4: What is the closest word to "bank": "river" or "trade"? (1pt)

    III. Closest words

      Now you have your WE model, you will use a tric to compute all the closest words.
      This can be done once for all, by computing the dot product between he matrix containing all vectors and the transpose of the target word vector (np.dot(vectors, v.T)).
      Then we can use some numpy trick to recover the indices of the n highest scores.

              def closest(vectors, vector, n=10):
                  n=n+1
                  scores = np.dot(vectors, vector.T)
                  indices = np.argpartition(scores, -n)[-n:]
                  indices = indices[np.argsort(scores[indices])]
                  output = []
                  for i in [int(x) for x in indices]:
                      output.append((scores[i], i))
                  return reversed(output)
      

        Exercice 1: Code the function

          Q1: What pre-process to all the vectors do you NEED to do before using the dot product instead of the cosine? (3pt)

          Q2: What "argpartition" stand for? (1pt)

          Q3: Add Comments to the code. (1pt)

        Exercice 2: Analysis

          Q1: What are the closest words to "apple"? (1pt)

          Q2: What about other words (close to "apple")? (1pt)

          Q3: Can you find words which have a strange neighborhood? (1pt)

          Q4: Check your answers using scipy.spatial.distance.cosine regarding the scores obtained what can you conclude about the dot product?

    IV. Analogy

      Word analogies can be exposed by translating a word vector in a direction that corresponds to a linguistic or semantic relationship between two other words.
      So if we have w1 and w2 in relation R(w1,w2), we can compute the relation r=vec(w2)-vec(W1), and then apply this relation to the vector of another word vec(w3).
      The word closest to vec(w3)+r should also exhibit the same relation.
      Therefore, the idea is to use the closest function to find vectors similar to vec(w2)-vec(w1)+vec(w3).

        Exercice 1: Code the analogy function

          Q1: Can you use the previous code? How far? (1pt)

        Exercice 2: Solve the analogies

          Q1: "paris" is to "france" what "delhi" is to ... ? (1pt)

          Q2: "gates" - "microsoft" + "apple" = ... ? (1pt)

          Q3: "king" - "man" + "woman" = ... ? (1pt)

          Q4: "slow" - "slower" + "fast" = ... ? (1pt)

        Exercice 3: Bonus

          Q1: Augment the dimension of the WE to 300 and retrain them (1pt)

          Q2: What are the impact on the analogies? (1pt)



The End.