tensorflow中mask

来源：互联网发布：高手寂寞知乎编辑：程序博客网时间：2024/05/20 01:09

    #loss的shape=[batch,num_step],seqlen的shape = [batch]    def mask(self, loss, seqlen):         mask = tf.sequence_mask(seqlen, maxlen=config.num_steps, dtype=tf.int32)        clear_loss = np.sum(loss * mask) / np.sum(mask)        return clear_loss

该段代码的实现是受下面问答的启发：

Question：
Hi,

Say I want to train some LSTM unit, and my training data has variable lengths with a maximum length of say, 30.
What is the right thing to do?

In TF we cannot dynamically create a computation graph of varied lengths, so the number of LSTM unrolling is fixed.
So do we have to pad everything to have a length of 30?

Let’s say my input is a sequence of symbols from a certain alphabet, do I have to add a “NUL” symbol to my alphabet, so that my input now looks like:
w1, w2, … wn, NUL, NUL, NUL, NUL…

This is what I am doing now. However I think this is wrong as the LSTM now will learn some additional behaviours when consuming the (artificial) NUL symbol.
I’m worried that models trained this way won’t be able to generalize well when the length is not bound to 30.

Thanks!
–evan

Answer：
for transduction problems (1:1 between sequence input and target) the general approach i think is to allow the RNN to run over these NUL values but then you apply a mask to zero out the cost associated with them.

eg for sequence [w1, w2, w3, NUL, NUL, NUL]
you first calculate the per element costs, say, costs = [3.1, 4.1, 5.9, 2.6, 5.3, 5.8]

usually you’d take the mean; np.mean(costs) = 4.8, but in this case you don’t care about the last three.

so now you’ll maintain a mask, 0 for NUL and 1 otherwise, mask = [1,1,1,0,0,0]
and you’ll calculate your sequence cost using this mask to zero out the costs you don’t care about;
sequence_cost = np.sum(costs * mask) / np.sum(mask)
(note! NOT np.mean(costs * mask) since the effective sequence “length” has changed from 6 to 3)

it’s “wasteful” in the sense you’re doing more work than the unpadded version but the argument is the denser packed data makes up for in the speed up of the lower level libraries

there are lots of examples of this in the tensorflow seq2seq models
see http://www.tensorflow.org/tutorials/seq2seq/index.html “bucketing and padding” for the high level view of this (+ the extended idea of bucketing)
and https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/seq2seq.py for more detail in code

原贴地址：https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/wk8sbFGyfHA

0 0