[翻译]斯坦福CS 20SI:基于Tensorflow的深度学习研究课程笔记,Lecture note 4: How to structure your model in TensorFlow
来源:互联网 发布:乌海四中网络选课系统 编辑:程序博客网 时间:2024/06/06 19:21
“CS 20SI: TensorFlow for Deep Learning Research”
Prepared by Chip Huyen
Reviewed by Danijar Hafner
Lecture note 4: How to structure your model in TensorFlow
个人翻译,部分内容较简略,建议参考原note阅读
本节课建立word2vec
模型,不熟悉可以阅读CS224N的课件Mikolov的原始论文
Skip - gram模型 vs CBOW模型( Continuous Bag - of - Words):
算法上是相似的,不同的是CBOW根据上下文预测中心词,Skip-gram正好相反,统计上来讲,C通过把全部上下文当做一个观测值而使大量分布信息更为平滑,对于比较小的数据集这种方法很有用,S把(上下文-目标)对当做一个观测值,这使数据集更大的时候更有效
Word2Vec Tutorial
建立skip-gram模型,我们更关心隐藏层的权重,权重是我们所尝试学习的,也叫词向量矩阵
如何建造 tensorflow 模型
阶段1: 建造图:
- 定义输入输出的占位符
- 定义权重
- 定义模型inference
- 定义损失函数
- 定义optimizer
阶段2:执行计算:
- 给第一次执行初始变量
- feed训练数据,可能需要随机化数据样本
- 在训练数据下执行模型inference,计算当前输入和当前模型参数的输出
- 计算损失
- 通过最小/大化模型损失调整参数
让我们根据这些步骤穿件w2v,Skip-gram模型:
阶段1: 构造图:
- 定义输入输出的占位符
输入中心词,输出目标词,使用词列表索引代替one-hot向量,[batch_size]个标量输入和输出
center_words = tf.placeholder( tf.int32 , shape =[ BATCH_SIZE ]) target_words = tf.placeholder( tf.int32 , shape =[ BATCH_SIZE ])
- 定义权重(词向量矩阵)
每行表示一个单词的词向量,每个词向量长度为EMBED_SIZE,那么词向量矩阵就是[VOCAB_SIZE,EMBED_SIZE],我们使用均匀分布初始化词向量矩阵
embed_matrix = tf.Variable(tf.random_uniform([ VOCAB_SIZE , EMBED_SIZE ], - 1.0 , 1.0 ))
- inference(图前向传播通路)
tf.nn.embedding_lookup( params , ids , partition_strategy = 'mod' , name = None , validate_indices = True , max_norm = None)
使用以上方法可以通过词向量矩阵转换单词索引为词向量
embed = tf.nn.embedding_lookup( embed_matrix , center_words)
4 定义损失函数
NCE用py实现很复杂,tf自带的实现:
tf.nn.nce_loss( weights , biases , labels , inputs , num_sampled , num_classes , num_true = 1 , sampled_values = None , remove_accidental_hits = False , partition_strategy = 'mod' , name = 'nce_loss')
我们需要隐藏层的权重和偏差去计算NCE损失
nce_weight = tf.Variable(tf.truncated_normal([VOCAB_SIZE, EMBED_SIZE], stddev=1.0 / (EMBED_SIZE ** 0.5)), name='nce_weight') nce_bias = tf.Variable(tf.zeros([VOCAB_SIZE]), name='nce_bias')
然后定义损失:
loss = tf.reduce_mean(tf.nn.nce_loss(weights=nce_weight, biases=nce_bias, labels=target_words, inputs=embed, num_sampled=NUM_SAMPLED, num_classes=VOCAB_SIZE), name='loss')
- 定义optimizer
optimizer = tf.train.GradientDescentOptimizer(LEARNING_RATE ).minimize(loss)
阶段2:执行计算
建立session,给占位符feed输入和输出,运行优化器最小化loss,取回loss值
with tf.Session() as sess: sess.run(tf.global_variables_initializer()) total_loss = 0.0 # we use this to calculate late average loss in the last SKIP_STEP steps writer = tf.summary.FileWriter('./my_graph/no_frills/', sess.graph) for index in xrange(NUM_TRAIN_STEPS): centers, targets = batch_gen.next() loss_batch, _ = sess.run([loss, optimizer], feed_dict={center_words: centers, target_words: targets}) total_loss += loss_batch if (index + 1) % SKIP_STEP == 0: print('Average loss at step {}: {:5.1f}'.format(index, total_loss / SKIP_STEP)) total_loss = 0.0 writer.close()
nameScope
給张量命名并且在tensorboard中观察:
看起来乱七八糟的
使用tf.name_scope(name)将节点分类,节点会显示为一个scope,点击scope可以显示内部细节.你会发现tensorboard有节点两种线,一种是实线,另一种是虚线,实线代表数据流,虚线是节点所依赖的操作.
完整的节点图标
之前我们建立了一个简单的顺序模型,我们可以使用py的面向对象方法编写一个易于重用的模型,把Skip_gram建立为一个类:
class SkipGramModel: """ Build the graph for word2vec model """ def __init__(self, vocab_size, embed_size, batch_size, num_sampled, learning_rate): self.vocab_size = vocab_size self.embed_size = embed_size self.batch_size = batch_size self.num_sampled = num_sampled self.lr = learning_rate self.global_step = tf.Variable(0, dtype=tf.int32, trainable=False, name='global_step') def _create_placeholders(self): """ Step 1: define the placeholders for input and output """ with tf.name_scope("data"): self.center_words = tf.placeholder(tf.int32, shape=[self.batch_size], name='center_words') self.target_words = tf.placeholder(tf.int32, shape=[self.batch_size, 1], name='target_words') def _create_embedding(self): """ Step 2: define weights. In word2vec, it's actually the weights that we care about """ # Assemble this part of the graph on the CPU. You can change it to GPU if you have GPU with tf.device('/cpu:0'): with tf.name_scope("embed"): self.embed_matrix = tf.Variable(tf.random_uniform([self.vocab_size, self.embed_size], -1.0, 1.0), name='embed_matrix') def _create_loss(self): """ Step 3 + 4: define the model + the loss function """ with tf.device('/cpu:0'): with tf.name_scope("loss"): # Step 3: define the inference embed = tf.nn.embedding_lookup(self.embed_matrix, self.center_words, name='embed') # Step 4: define loss function # construct variables for NCE loss nce_weight = tf.Variable(tf.truncated_normal([self.vocab_size, self.embed_size], stddev=1.0 / (self.embed_size ** 0.5)), name='nce_weight') nce_bias = tf.Variable(tf.zeros([VOCAB_SIZE]), name='nce_bias') # define loss function to be NCE loss function self.loss = tf.reduce_mean(tf.nn.nce_loss(weights=nce_weight, biases=nce_bias, labels=self.target_words, inputs=embed, num_sampled=self.num_sampled, num_classes=self.vocab_size), name='loss') def _create_optimizer(self): """ Step 5: define optimizer """ with tf.device('/cpu:0'): self.optimizer = tf.train.GradientDescentOptimizer(self.lr).minimize(self.loss, global_step=self.global_step) def _create_summaries(self): with tf.name_scope("summaries"): tf.summary.scalar("loss", self.loss) tf.summary.histogram("histogram loss", self.loss) # because you have several summaries, we should merge them all # into one op to make it easier to manage self.summary_op = tf.summary.merge_all() def build_graph(self): """ Build the graph for our model """ self._create_placeholders() self._create_embedding() self._create_loss() self._create_optimizer() self._create_summaries()def train_model(model, batch_gen, num_train_steps, weights_fld): saver = tf.train.Saver() # defaults to saving all variables - in this case embed_matrix, nce_weight, nce_bias initial_step = 0 with tf.Session() as sess: sess.run(tf.global_variables_initializer()) ckpt = tf.train.get_checkpoint_state(os.path.dirname('checkpoints/checkpoint')) # if that checkpoint exists, restore from checkpoint if ckpt and ckpt.model_checkpoint_path: saver.restore(sess, ckpt.model_checkpoint_path) total_loss = 0.0 # we use this to calculate late average loss in the last SKIP_STEP steps writer = tf.summary.FileWriter('improved_graph/lr' + str(LEARNING_RATE), sess.graph) initial_step = model.global_step.eval() for index in xrange(initial_step, initial_step + num_train_steps): centers, targets = batch_gen.next() feed_dict={model.center_words: centers, model.target_words: targets} loss_batch, _, summary = sess.run([model.loss, model.optimizer, model.summary_op], feed_dict=feed_dict) writer.add_summary(summary, global_step=index) total_loss += loss_batch if (index + 1) % SKIP_STEP == 0: print('Average loss at step {}: {:5.1f}'.format(index, total_loss / SKIP_STEP)) total_loss = 0.0 saver.save(sess, 'checkpoints/skip-gram', index)
通过t-SNE我们可以把我们的词向量矩阵可视化,我们可以看到所有的数字被分类到右下角一行,挨着字母和名字,所有月份被分类到一组,所有”do,does,did”被分为一组等等.
如果你打印’American’接近的词:
t - SNE(来自维基百科)
t - 分布随机相邻嵌入(t-SNE)是一种由Geoffrey Hinton和Laurens van der Maaten开发的机器学习降维算法。它是一种非线性降维技术,其特别适合将高维数据转换到二维或三维空间,然后以散点图可视化。具体来说,它通过一个二或三维空间来模拟每个高维对维点,使得类似对象由附近的点和不相似的对象的远近建模。t-SNE算法包括两个主要阶段。首先,t-SNE构建类似对象具有被选择的高概率,而不相似的点具有非常小被挑选的概率。第二,t-SNE定义了类似的概率分布低维地图中的点,并且在相对于地图中的点的位置的两个分布之间最小化Kullback-Leibler散度
t-SNE使MNIST数据集可视化
我们也可以使用PCA使数据可视化
我们可以用不到十行的代码使数据可视化,tensorboard提供了很好的工具:
from tensorflow.contrib.tensorboard.plugins import projector# obtain the embedding_matrix after you’ve trained itfinal_embed_matrix = sess . run ( model . embed_matrix)# create a variable to hold your embeddings. It has to be a variable. Constants# don’t work. You also can’t just use the embed_matrix we defined earlier for our model. Why# is that so? I don’t know. I get the 500 most popular words.embedding_var = tf.Variable(final_embed_matrix[:500], name='embedding')sess.run(embedding_var.initializer)config = projector.ProjectorConfig()summary_writer = tf.summary.FileWriter(LOGDIR)# add embeddings to configembedding = config.embeddings.add()embedding.tensor_name = embedding_var.name# link the embeddings to their metadata file. In this case, the file that contains# the 500 most popular words in our vocabularyembedding.metadata_path = LOGDIR + '/vocab_500.tsv'# save a configuration file that TensorBoard will read during startupprojector.visualize_embeddings(summary_writer, config)# save our embeddingsaver_embed = tf.train.Saver([embedding_var])saver_embed.save(sess, LOGDIR + '/skip-gram.ckpt', 1)
为什么我们依然要了解梯度
虽然目前为止我们建立的模型都没有获取单个节点的梯度,因为tf会自动考虑反向传播,但是我们依然要学会如何获取梯度,因为tf并不能分辨梯度消失或者梯度爆炸的情况,我们需要了解模型的梯度去获知模型是否正常工作
- [翻译]斯坦福CS 20SI:基于Tensorflow的深度学习研究课程笔记,Lecture note 4: How to structure your model in TensorFlow
- [翻译]斯坦福CS 20SI:基于Tensorflow的深度学习研究课程笔记,Lecture note 2: TensorFlow Ops
- [翻译]斯坦福CS 20SI:基于Tensorflow的深度学习研究课程笔记,Lecture note 3: Linear and Logistic Regression in TensorFlow
- [翻译]斯坦福CS20SI:基于Tensorflow的深度学习研究课程笔记,Lecture note 1: Introduction to TensorFlow
- CS 20SI|Lecture 4 Structure your TensorFlow model
- 斯坦福CS20SI:基于Tensorflow的深度学习研究课程笔记,Lecture note2:TensorFlow Ops
- 斯坦福CS20SI:基于Tensorflow的深度学习研究课程笔记,Lecture note3:TensorFlow上的线性回归和逻辑回归
- Lecture note 1: Introduction to TensorFlow(斯坦福Tensorflow课程)
- CS 20SI|Lecture 7 Playing with convolutions in TensorFlow
- CS 20SI|Lecture 11 RNNs in TensorFlow
- CS 20SI|Lecture 1 Introduction to TensorFlow
- 【CS 20SI】TensorFlow for Deep Learning Research Lecture note 2: TensorFlow Ops
- CS 20SI|Lecture 2 TensorFlow Ops
- CS 20SI|Lecture 3 Linear and Logistic Regression in TensorFlow
- [note] deep learning tensorflow lecture 1 notes 深度学习笔记 (1)
- [note] deep learning tensorflow lecture 2 notes 深度学习笔记 (2) 解决过拟合
- Tensorflow-4-模型结构化Structure your model
- How to use tensorflow pre-trained model in Android
- python3爬虫的编码问题
- bzoj 1271 二分
- CentOS 搭建的Java环境
- JavaScript快速入门
- mykernel实验指导(操作系统是如何工作的)
- [翻译]斯坦福CS 20SI:基于Tensorflow的深度学习研究课程笔记,Lecture note 4: How to structure your model in TensorFlow
- Zipline的安装教程
- TCP/IP基础四
- APP开发实战159-广播的安全性
- 欢迎浏览GitHub+Hexo搭建的博客(有搭建教程)
- ionic 创建项目步骤
- A Simple Algebraic Way to understand Quaternions and Rotations in 3D
- Spring中事务
- JDBC PreparedStatement example – Batch Update