tensorflow中mask
来源:互联网 发布:高手寂寞 知乎 编辑:程序博客网 时间:2024/05/20 01:09
#loss的shape=[batch,num_step],seqlen的shape = [batch] def mask(self, loss, seqlen): mask = tf.sequence_mask(seqlen, maxlen=config.num_steps, dtype=tf.int32) clear_loss = np.sum(loss * mask) / np.sum(mask) return clear_loss
该段代码的实现是受下面问答的启发:
Question:
Hi,
Say I want to train some LSTM unit, and my training data has variable lengths with a maximum length of say, 30.
What is the right thing to do?
In TF we cannot dynamically create a computation graph of varied lengths, so the number of LSTM unrolling is fixed.
So do we have to pad everything to have a length of 30?
Let’s say my input is a sequence of symbols from a certain alphabet, do I have to add a “NUL” symbol to my alphabet, so that my input now looks like:
w1, w2, … wn, NUL, NUL, NUL, NUL…
This is what I am doing now. However I think this is wrong as the LSTM now will learn some additional behaviours when consuming the (artificial) NUL symbol.
I’m worried that models trained this way won’t be able to generalize well when the length is not bound to 30.
Thanks!
–evan
Answer:
for transduction problems (1:1 between sequence input and target) the general approach i think is to allow the RNN to run over these NUL values but then you apply a mask to zero out the cost associated with them.
eg for sequence [w1, w2, w3, NUL, NUL, NUL]
you first calculate the per element costs, say, costs = [3.1, 4.1, 5.9, 2.6, 5.3, 5.8]
usually you’d take the mean; np.mean(costs) = 4.8, but in this case you don’t care about the last three.
so now you’ll maintain a mask, 0 for NUL and 1 otherwise, mask = [1,1,1,0,0,0]
and you’ll calculate your sequence cost using this mask to zero out the costs you don’t care about;
sequence_cost = np.sum(costs * mask) / np.sum(mask)
(note! NOT np.mean(costs * mask) since the effective sequence “length” has changed from 6 to 3)
it’s “wasteful” in the sense you’re doing more work than the unpadded version but the argument is the denser packed data makes up for in the speed up of the lower level libraries
there are lots of examples of this in the tensorflow seq2seq models
see http://www.tensorflow.org/tutorials/seq2seq/index.html “bucketing and padding” for the high level view of this (+ the extended idea of bucketing)
and https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/seq2seq.py for more detail in code
原贴地址:https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/wk8sbFGyfHA
- tensorflow中mask
- Mask RCNN in TensorFlow
- Tensorflow-Mask RCNN
- Mask R-CNN Keras Tensorflow实现
- Resnet-Tensorflow 在Mask-Rcnn 中的结构
- Mask R-CNN TensorFlow 实验(一)
- opencv中mask操作
- mask
- mask
- Mask
- Opencv中mask的作用
- opencv中mask的作用
- TensorFlow实战:Mask R-CNN介绍与实现,instance segmention
- Tensorflow object detection API 源码阅读笔记:Mask R-CNN
- Ogre中查询器Mask类型
- ACL中wildcard mask的使用
- Ogre中查询器Mask类型
- unity 中Clear Flags和Culling Mask
- 【转】奇异值分解(SVD)原理详解及推导
- python基础知识之字典和列表
- error: Exited sync due to fetch errors
- [51nod1142] 棋子遍历棋盘
- 数据结构之三元组
- tensorflow中mask
- iOS生成二维码
- JAVA内存区域
- 事件
- JavaScript语言基础
- deepin 编译提示g++: command not found的解决
- oracle数据库对象篇Database Object
- JS的进阶上山打怪咯之数据类型(一)
- linux平台下的写文件刷新