[翻译]斯坦福CS 20SI:基于Tensorflow的深度学习研究课程笔记,Lecture note 4: How to structure your model in TensorFlow

“CS 20SI: TensorFlow for Deep Learning Research”

Prepared by Chip Huyen
Reviewed by Danijar Hafner

Lecture note 4: How to structure your model in TensorFlow


Skip - gram模型 vs CBOW模型( Continuous Bag - of - Words):


Word2Vec Tutorial


如何建造 tensorflow 模型

  • 阶段1: 建造图:

    1. 定义输入输出的占位符
    2. 定义权重
    3. 定义模型inference
    4. 定义损失函数
    5. 定义optimizer
  • 阶段2:执行计算:

    1. 给第一次执行初始变量
    2. feed训练数据,可能需要随机化数据样本
    3. 在训练数据下执行模型inference,计算当前输入和当前模型参数的输出
    4. 计算损失
    5. 通过最小/大化模型损失调整参数


阶段1: 构造图:

  1. 定义输入输出的占位符
center_words  =  tf.placeholder( tf.int32 ,  shape =[ BATCH_SIZE ]) target_words  =  tf.placeholder( tf.int32 ,  shape =[ BATCH_SIZE ])
  1. 定义权重(词向量矩阵)
  embed_matrix  =  tf.Variable(tf.random_uniform([ VOCAB_SIZE ,  EMBED_SIZE ],   - 1.0 ,   1.0 ))
  1. inference(图前向传播通路)
tf.nn.embedding_lookup( params ,  ids ,  partition_strategy = 'mod' ,  name = None , validate_indices = True ,  max_norm = None)


embed  =  tf.nn.embedding_lookup( embed_matrix ,  center_words)

4 定义损失函数

tf.nn.nce_loss( weights ,  biases ,  labels ,  inputs ,  num_sampled ,  num_classes ,  num_true = 1 , sampled_values = None ,  remove_accidental_hits = False ,  partition_strategy = 'mod' , name = 'nce_loss')


nce_weight = tf.Variable(tf.truncated_normal([VOCAB_SIZE, EMBED_SIZE],                                                stddev=1.0 / (EMBED_SIZE ** 0.5)),                                                 name='nce_weight')    nce_bias = tf.Variable(tf.zeros([VOCAB_SIZE]), name='nce_bias')


loss = tf.reduce_mean(tf.nn.nce_loss(weights=nce_weight,                                         biases=nce_bias,                                         labels=target_words,                                         inputs=embed,                                         num_sampled=NUM_SAMPLED,                                         num_classes=VOCAB_SIZE), name='loss')
  1. 定义optimizer
optimizer  =  tf.train.GradientDescentOptimizer(LEARNING_RATE ).minimize(loss)



with tf.Session() as sess:        sess.run(tf.global_variables_initializer())        total_loss = 0.0 # we use this to calculate late average loss in the last SKIP_STEP steps        writer = tf.summary.FileWriter('./my_graph/no_frills/', sess.graph)        for index in xrange(NUM_TRAIN_STEPS):            centers, targets = batch_gen.next()            loss_batch, _ = sess.run([loss, optimizer],                                     feed_dict={center_words: centers, target_words: targets})            total_loss += loss_batch            if (index + 1) % SKIP_STEP == 0:                print('Average loss at step {}: {:5.1f}'.format(index, total_loss / SKIP_STEP))                total_loss = 0.0        writer.close()





class SkipGramModel:    """ Build the graph for word2vec model """    def __init__(self, vocab_size, embed_size, batch_size, num_sampled, learning_rate):        self.vocab_size = vocab_size        self.embed_size = embed_size        self.batch_size = batch_size        self.num_sampled = num_sampled        self.lr = learning_rate        self.global_step = tf.Variable(0, dtype=tf.int32, trainable=False, name='global_step')    def _create_placeholders(self):        """ Step 1: define the placeholders for input and output """        with tf.name_scope("data"):            self.center_words = tf.placeholder(tf.int32, shape=[self.batch_size], name='center_words')            self.target_words = tf.placeholder(tf.int32, shape=[self.batch_size, 1], name='target_words')    def _create_embedding(self):        """ Step 2: define weights. In word2vec, it's actually the weights that we care about """        # Assemble this part of the graph on the CPU. You can change it to GPU if you have GPU        with tf.device('/cpu:0'):            with tf.name_scope("embed"):                self.embed_matrix = tf.Variable(tf.random_uniform([self.vocab_size,                                                                     self.embed_size], -1.0, 1.0),                                                                     name='embed_matrix')    def _create_loss(self):        """ Step 3 + 4: define the model + the loss function """        with tf.device('/cpu:0'):            with tf.name_scope("loss"):                # Step 3: define the inference                embed = tf.nn.embedding_lookup(self.embed_matrix, self.center_words, name='embed')                # Step 4: define loss function                # construct variables for NCE loss                nce_weight = tf.Variable(tf.truncated_normal([self.vocab_size, self.embed_size],                                                            stddev=1.0 / (self.embed_size ** 0.5)),                                                             name='nce_weight')                nce_bias = tf.Variable(tf.zeros([VOCAB_SIZE]), name='nce_bias')                # define loss function to be NCE loss function                self.loss = tf.reduce_mean(tf.nn.nce_loss(weights=nce_weight,                                                     biases=nce_bias,                                                     labels=self.target_words,                                                     inputs=embed,                                                     num_sampled=self.num_sampled,                                                     num_classes=self.vocab_size), name='loss')    def _create_optimizer(self):        """ Step 5: define optimizer """        with tf.device('/cpu:0'):            self.optimizer = tf.train.GradientDescentOptimizer(self.lr).minimize(self.loss,                                                               global_step=self.global_step)    def _create_summaries(self):        with tf.name_scope("summaries"):            tf.summary.scalar("loss", self.loss)            tf.summary.histogram("histogram loss", self.loss)            # because you have several summaries, we should merge them all            # into one op to make it easier to manage            self.summary_op = tf.summary.merge_all()    def build_graph(self):        """ Build the graph for our model """        self._create_placeholders()        self._create_embedding()        self._create_loss()        self._create_optimizer()        self._create_summaries()def train_model(model, batch_gen, num_train_steps, weights_fld):    saver = tf.train.Saver() # defaults to saving all variables - in this case embed_matrix, nce_weight, nce_bias    initial_step = 0    with tf.Session() as sess:        sess.run(tf.global_variables_initializer())        ckpt = tf.train.get_checkpoint_state(os.path.dirname('checkpoints/checkpoint'))        # if that checkpoint exists, restore from checkpoint        if ckpt and ckpt.model_checkpoint_path:            saver.restore(sess, ckpt.model_checkpoint_path)        total_loss = 0.0 # we use this to calculate late average loss in the last SKIP_STEP steps        writer = tf.summary.FileWriter('improved_graph/lr' + str(LEARNING_RATE), sess.graph)        initial_step = model.global_step.eval()        for index in xrange(initial_step, initial_step + num_train_steps):            centers, targets = batch_gen.next()            feed_dict={model.center_words: centers, model.target_words: targets}            loss_batch, _, summary = sess.run([model.loss, model.optimizer, model.summary_op],                                               feed_dict=feed_dict)            writer.add_summary(summary, global_step=index)            total_loss += loss_batch            if (index + 1) % SKIP_STEP == 0:                print('Average loss at step {}: {:5.1f}'.format(index, total_loss / SKIP_STEP))                total_loss = 0.0                saver.save(sess, 'checkpoints/skip-gram', index)



t - SNE(来自维基百科)
t - 分布随机相邻嵌入(t-SNE)是一种由Geoffrey Hinton和Laurens van der Maaten开发的机器学习降维算法。它是一种非线性降维技术,其特别适合将高维数据转换到二维或三维空间,然后以散点图可视化。具体来说,它通过一个二或三维空间来模拟每个高维对维点,使得类似对象由附近的点和不相似的对象的远近建模。t-SNE算法包括两个主要阶段。首先,t-SNE构建类似对象具有被选择的高概率,而不相似的点具有非常小被挑选的概率。第二,t-SNE定义了类似的概率分布低维地图中的点,并且在相对于地图中的点的位置的两个分布之间最小化Kullback-Leibler散度




from tensorflow.contrib.tensorboard.plugins import projector# obtain the embedding_matrix after you’ve trained itfinal_embed_matrix = sess . run ( model . embed_matrix)# create a variable to hold your embeddings. It has to be a variable. Constants# don’t work. You also can’t just use the embed_matrix we defined earlier for our model. Why# is that so? I don’t know. I get the 500 most popular words.embedding_var = tf.Variable(final_embed_matrix[:500], name='embedding')sess.run(embedding_var.initializer)config = projector.ProjectorConfig()summary_writer = tf.summary.FileWriter(LOGDIR)# add embeddings to configembedding = config.embeddings.add()embedding.tensor_name = embedding_var.name# link the embeddings to their metadata file. In this case, the file that contains# the 500 most popular words in our vocabularyembedding.metadata_path = LOGDIR + '/vocab_500.tsv'# save a configuration file that TensorBoard will read during startupprojector.visualize_embeddings(summary_writer, config)# save our embeddingsaver_embed = tf.train.Saver([embedding_var])saver_embed.save(sess, LOGDIR + '/skip-gram.ckpt', 1)



