机器学习中的神经网络Neural Networks for Machine Learning:Programming Assignment 2: Learning Word Representatio

来源:互联网 发布:ubuntu安装aria2 编辑:程序博客网 时间:2024/04/28 04:04

Programming Assignment 2: Learning Word Representations.Help Center

Warning: The hard deadline has passed. You can attempt it, but you will not get credit for it. You are welcome to try it as a learning exercise.

We are now ready to start using neural nets for solving real problems!

In this assignment we will design a neural net language model. The model will learn to predict the next word given the previous three words. The network looks like this:

To get started, download any one of the following archives. 
assignment2.tar.gz 
Or 
assignment2.zip 
Or each file individually

  • README.txt
  • train.m
  • data.mat
  • raw_sentences.txt
  • fprop.m
  • word_distance.m
  • display_nearest_word.m
  • predict_next_word.m
  • load_data.m

The starter code implements a basic framework for training neural nets with mini-batch gradient descent. Your job is to write code to complete the implementation of forward and back propagation. See the README file for a description of the dataset, starter code and how to run it. 
This sample_output shows you what output to expect once everything is implemented correctly. 
Once you have implemented the required code and have the model running, answer the following questions. 
Happy coding!

Question 1

Train a model with 50 dimensional embedding space, 200 dimensional hidden layer and default setting of all other hyperparameters. What is average training set cross entropy as reported by the training program after 10 epochs ? Please provide a numeric answer (three decimal places). [4 points]

Question 2

Train a model for 10 epochs with a 50 dimensional embedding space, 200 dimensional hidden layer, a learning rate of 0.0001 and default setting of all other hyperparameters. What do you observe ? [3 points]

Question 3

If all weights and biases in this network were set to zero and no training is performed, what will be the average cross entropy on the training set ? Please provide a numeric answer (three decimal places). [3 points]

Question 4

Train three models each with 50 dimensional embedding space, 200 dimensional hidden layer. 
  • Model A: Learning rate = 0.001,
  • Model B: Learning rate = 0.1
  • Model C: Learning rate = 10.0.
Use the default settings for all other hyperparameters. Which model gives the lowest training set cross entropy after 1 epoch ? [3 points]

Question 5

In the models trained in Question 4, which one gives the lowest training set cross entropy after 10 epochs ? [2 points]

Question 6

Train each of following models:
  • Model A: 5 dimensional embedding, 100 dimensional hidden layer
  • Model B: 50 dimensional embedding, 10 dimensional hidden layer
  • Model C: 50 dimensional embedding, 200 dimensional hidden layer
  • Model D: 100 dimensional embedding, 5 dimensional hidden layer
Use default values for all other hyperparameters. Which model gives the best training set cross entropy after 10 epochs of training ? [3 points]

Question 7

In the models trained in Question 6, which one gives the best validation set cross entropy after 10 epochs of training ? [2 points]

Question 8

Train three models each with 50 dimensional embedding space, 200 dimensional hidden layer. 
  • Model A: Momentum = 0.0
  • Model B: Momentum = 0.5
  • Model C: Momentum = 0.9
Use the default settings for all other hyperparameters. Which model gives the lowest validation set cross entropy after 5 epochs ? [3 points]

Question 9

Train a model with 50 dimensional embedding layer and 200 dimensional hidden layer for 10 epochs. Use default values for all other hyperparameters. Which words are among the 10 closest words to the word 'day'. [2 points]

Question 10

In the model trained in Question 9, why is the word 'percent' close to 'dr.' even though they have very different contexts and are not expected to be close in word embedding space? [2 points]

Question 11

In the model trained in Question 9, why is 'he' close to 'she' even though they refer to completely different genders? [2 points]

Question 12

In conclusion, what kind of words does the model put close to each other in embedding space. Choose the most appropriate answer. [3 points]
0 0