tensorflow的legacy_seq2seq模块
来源:互联网 发布:天刀杨幂捏脸数据 编辑:程序博客网 时间:2024/05/20 15:57
tensorflow的legacy_seq2seq
tensorflow要重新给出一套seq2seq的接口,把之前的seq2seq搬到了legacy_seq2seq下,今天读的就是来自这里的代码。目前很多代码还是使用了老的seq2seq接口,因此仍有熟悉的必要。
_extract_argmax_and_embed
1234567891011121314151617181920212223242526
def _extract_argmax_and_embed(embedding,output_projection=None,update_embedding=True):"""Get a loop_function that extracts the previous symbol and embeds it.Args:embedding: embedding tensor for symbols.output_projection: None or a pair (W, B). If provided, each fed previousoutput will first be multiplied by W and added B.update_embedding: Boolean; if False, the gradients will not propagatethrough the embeddings.Returns:A loop function."""def loop_function(prev, _):if output_projection is not None:prev = nn_ops.xw_plus_b(prev, output_projection[0], output_projection[1])prev_symbol = math_ops.argmax(prev, 1)# Note that gradients will not propagate through the second parameter of# embedding_lookup.emb_prev = embedding_ops.embedding_lookup(embedding, prev_symbol)if not update_embedding:emb_prev = array_ops.stop_gradient(emb_prev)return emb_prevreturn loop_function
rnn_decoder
12345678910111213141516171819202122232425262728293031323334353637383940414243
def rnn_decoder(decoder_inputs,initial_state,cell,loop_function=None,scope=None):"""RNN decoder for the sequence-to-sequence model.Args:decoder_inputs: A list of 2D Tensors [batch_size x input_size].initial_state: 2D Tensor with shape [batch_size x cell.state_size].cell: core_rnn_cell.RNNCell defining the cell function and size.loop_function: If not None, this function will be applied to the i-th outputin order to generate the i+1-st input, and decoder_inputs will be ignored,except for the first element ("GO" symbol). This can be used for decoding,but also for training to emulate http://arxiv.org/abs/1506.03099.Signature -- loop_function(prev, i) = next* prev is a 2D Tensor of shape [batch_size x output_size],* i is an integer, the step number (when advanced control is needed),* next is a 2D Tensor of shape [batch_size x input_size].scope: VariableScope for the created subgraph; defaults to "rnn_decoder".Returns:A tuple of the form (outputs, state), where:outputs: A list of the same length as decoder_inputs of 2D Tensors withshape [batch_size x output_size] containing generated outputs.state: The state of each cell at the final time-step.It is a 2D Tensor of shape [batch_size x cell.state_size].(Note that in some cases, like basic RNN cell or GRU cell, outputs andstates can be the same. They are different for LSTM cells though.)"""with variable_scope.variable_scope(scope or "rnn_decoder"):state = initial_stateoutputs = []prev = Nonefor i, inp in enumerate(decoder_inputs):if loop_function is not None and prev is not None:with variable_scope.variable_scope("loop_function", reuse=True):inp = loop_function(prev, i)if i > 0:variable_scope.get_variable_scope().reuse_variables()output, state = cell(inp, state)outputs.append(output)if loop_function is not None:prev = outputreturn outputs, state
decoder_inputs
:是a list,其中的每一个元素表示的是t_i
时刻的输入,每一时刻的输入又会有batch_size个,每一个输入(通差是表示一个word或token)又是input_size维度的。loop_function
: 如果loop_function有设置的话,decoder input中第一个”GO”会输入,但之后时刻的input就会被忽略,取代的是input_ti+1 = loop_function(output_ti)
这里定义的loop_function,有2个参数,(prev,i),输出为next
输出:outputs
:既然是每一时刻的input都会对应得到一个output,自然outputs的shape和decoder_inputs是一样,是a list,每个元素的shape=[batch_size, input_size](但是这里为了区别,认为是output_size)state
:最后一个时刻t的cell state,shape=[batch_size, cell.state_size]
basic_rnn_seq2seq
1234567891011121314151617181920212223242526
def basic_rnn_seq2seq(encoder_inputs,decoder_inputs,cell,dtype=dtypes.float32,scope=None):"""Basic RNN sequence-to-sequence model.This model first runs an RNN to encode encoder_inputs into a state vector,then runs decoder, initialized with the last encoder state, on decoder_inputs.Encoder and decoder use the same RNN cell type, but don't share parameters.Args:encoder_inputs: A list of 2D Tensors [batch_size x input_size].decoder_inputs: A list of 2D Tensors [batch_size x input_size].cell: core_rnn_cell.RNNCell defining the cell function and size.dtype: The dtype of the initial state of the RNN cell (default: tf.float32).scope: VariableScope for the created subgraph; default: "basic_rnn_seq2seq".Returns:A tuple of the form (outputs, state), where:outputs: A list of the same length as decoder_inputs of 2D Tensors withshape [batch_size x output_size] containing the generated outputs.state: The state of each decoder cell in the final time-step.It is a 2D Tensor of shape [batch_size x cell.state_size]."""with variable_scope.variable_scope(scope or "basic_rnn_seq2seq"):enc_cell = copy.deepcopy(cell)_, enc_state = core_rnn.static_rnn(enc_cell, encoder_inputs, dtype=dtype)return rnn_decoder(decoder_inputs, enc_state, cell)
encoder_inputs
:a list,每个元素是时刻t
的输入,每一时刻又存在batch_size个输入(word or token),并且每个token用input_size来表示(embedding)。因此,是a list of [batch_size, input_size]decoder_inputs
:同上,但是这两个list的长度
可能不同,前者根据encoder_max_length指定,decoder根据decoder_max_length指定。
输出:outputs
:shape和decoder_inputs
相同,差别在于这里用output_size和input_size区别【whystate
:还是最后一个时刻的cell state,[batch_size, cell.state_size]
注意到这里用到深拷贝
:
深拷贝是在另一块地址中创建一个新的变量或容器,同时容器内的元素的地址也是新开辟的,仅仅是值相同而已,是完全的副本。也就是说( 新瓶装新酒 )。
encode阶段使用的是core_rnn.static_rnn()
不知道这个函数和别的rnn有什么不同?
decode阶段,很基本,直接使用了上面提到的rnn_decoder
来生成最后的outputs和state,返回。
static_rnn
代码在这,比较繁琐,就不详细解读了。
embedding_rnn_decoder
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
def embedding_rnn_decoder(decoder_inputs,initial_state,cell,num_symbols,embedding_size,output_projection=None,feed_previous=False,update_embedding_for_previous=True,scope=None):"""RNN decoder with embedding and a pure-decoding option.Args:decoder_inputs: A list of 1D batch-sized int32 Tensors (decoder inputs).initial_state: 2D Tensor [batch_size x cell.state_size].cell: core_rnn_cell.RNNCell defining the cell function.num_symbols: Integer, how many symbols come into the embedding.embedding_size: Integer, the length of the embedding vector for each symbol.output_projection: None or a pair (W, B) of output projection weights andbiases; W has shape [output_size x num_symbols] and B hasshape [num_symbols]; if provided and feed_previous=True, each fedprevious output will first be multiplied by W and added B.feed_previous: Boolean; if True, only the first of decoder_inputs will beused (the "GO" symbol), and all other decoder inputs will be generated by:next = embedding_lookup(embedding, argmax(previous_output)),In effect, this implements a greedy decoder. It can also be usedduring training to emulate http://arxiv.org/abs/1506.03099.If False, decoder_inputs are used as given (the standard decoder case).update_embedding_for_previous: Boolean; if False and feed_previous=True,only the embedding for the first symbol of decoder_inputs (the "GO"symbol) will be updated by back propagation. Embeddings for the symbolsgenerated from the decoder itself remain unchanged. This parameter hasno effect if feed_previous=False.scope: VariableScope for the created subgraph; defaults to"embedding_rnn_decoder".Returns:A tuple of the form (outputs, state), where:outputs: A list of the same length as decoder_inputs of 2D Tensors. Theoutput is of shape [batch_size x cell.output_size] whenoutput_projection is not None (and represents the dense representationof predicted tokens). It is of shape [batch_size x num_decoder_symbols]when output_projection is None.state: The state of each decoder cell in each time-step. This is a listwith length len(decoder_inputs) -- one item for each time-step.It is a 2D Tensor of shape [batch_size x cell.state_size].Raises:ValueError: When output_projection has the wrong shape."""with variable_scope.variable_scope(scope or "embedding_rnn_decoder") as scope:if output_projection is not None:dtype = scope.dtypeproj_weights = ops.convert_to_tensor(output_projection[0], dtype=dtype)proj_weights.get_shape().assert_is_compatible_with([None, num_symbols])proj_biases = ops.convert_to_tensor(output_projection[1], dtype=dtype)proj_biases.get_shape().assert_is_compatible_with([num_symbols])embedding = variable_scope.get_variable("embedding",[num_symbols, embedding_size])loop_function = _extract_argmax_and_embed(embedding, output_projection,update_embedding_for_previous) if feed_previous else Noneemb_inp = (embedding_ops.embedding_lookup(embedding, i)for i in decoder_inputs)return rnn_decoder(emb_inp, initial_state, cell, loop_function=loop_function)
刚才讲了一个basic的decoder叫rnn_decoder
:rnn_decoder(decoder_inputs,initial_state,cell,loop_function=None,scope=None),现在来一个稍微高级一点的。
对比一下发现这个decoder没有loop_function,多出来了num_symbols
,embedding_size
,output_projection=None
,feed_previous=False
,update_embedding_for_previous=True
。这些都是什么呢?
参数:decoder_inputs
:既然这个标榜了embedding,那么input肯定和rnn_decoder有些不同。这里input变为1维,[batch_size, ]也就是说,输入不需要自己做embedding了,直接输入tokens在vocab中对应的idx(即ids)即可,内部会自动帮我们进行id到embedding的转化。num_symbols
:就是vocab_sizeembedding_size
:每个token需要embedding成的维数,比如100output_projection
:(W, b)就是将输出做一个映射。为什么要映射,因为此时input相当于a list of [batch_size, 1],内部帮我们做一个embedding,得到embedded_input=[batch_size, embedding_size ],经过cell之后,得到[batch_size, output_size](这个过程就是之前的rnn_decoder做的事情)。这样之后,如果我们设置了feed_previous=True,也就是需要将前一时刻的output作为下一时刻的input,那么前一时刻的output中要从vocab_size中选出一个分数最高的token来,即argmax(previous_output)。过程如下图描述的那样:
但是,现在的output维度是output_size,并不能知道每个vocab的得分情况。因此要从output_size映射到vocab_size(这里的num_symbols)。
我们知道,x(某一时刻的output)的shape=[batch_size, output_size],映射的公式是xw+b,那么w的shape=[output_zize, num_symbols]
update_embedding_for_previous
:如果前一时刻的output不作为当前的input的话(feed_previous=False),这个参数没影响();否则,该参数默认是True,但如果设置成false,则表示不对前一个embedding进行更新,那么bp的时候只会更新”GO”的embedding,其他token(decoder生成的)embedding不变。
输出:outputs
:如果output_projection=None的话,也就是不进行映射(直接输出的是num_symbols的个数),那么a list of [batch_size, num_symbols];如果不为None,说明outputs要进行映射,则outputs是a list of [batch_size, num_symbols]state
同上
embedding_rnn_seq2seq
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
def embedding_rnn_seq2seq(encoder_inputs,decoder_inputs,cell,num_encoder_symbols,num_decoder_symbols,embedding_size,output_projection=None,feed_previous=False,dtype=None,scope=None):"""Embedding RNN sequence-to-sequence model.This model first embeds encoder_inputs by a newly created embedding (of shape[num_encoder_symbols x input_size]). Then it runs an RNN to encodeembedded encoder_inputs into a state vector. Next, it embeds decoder_inputsby another newly created embedding (of shape [num_decoder_symbols xinput_size]). Then it runs RNN decoder, initialized with the lastencoder state, on embedded decoder_inputs.Args:encoder_inputs: A list of 1D int32 Tensors of shape [batch_size].decoder_inputs: A list of 1D int32 Tensors of shape [batch_size].cell: core_rnn_cell.RNNCell defining the cell function and size.num_encoder_symbols: Integer; number of symbols on the encoder side.num_decoder_symbols: Integer; number of symbols on the decoder side.embedding_size: Integer, the length of the embedding vector for each symbol.output_projection: None or a pair (W, B) of output projection weights andbiases; W has shape [output_size x num_decoder_symbols] and B hasshape [num_decoder_symbols]; if provided and feed_previous=True, eachfed previous output will first be multiplied by W and added B.feed_previous: Boolean or scalar Boolean Tensor; if True, only the firstof decoder_inputs will be used (the "GO" symbol), and all other decoderinputs will be taken from previous outputs (as in embedding_rnn_decoder).If False, decoder_inputs are used as given (the standard decoder case).dtype: The dtype of the initial state for both the encoder and encoderrnn cells (default: tf.float32).scope: VariableScope for the created subgraph; defaults to"embedding_rnn_seq2seq"Returns:A tuple of the form (outputs, state), where:outputs: A list of the same length as decoder_inputs of 2D Tensors. Theoutput is of shape [batch_size x cell.output_size] whenoutput_projection is not None (and represents the dense representationof predicted tokens). It is of shape [batch_size x num_decoder_symbols]when output_projection is None.state: The state of each decoder cell in each time-step. This is a listwith length len(decoder_inputs) -- one item for each time-step.It is a 2D Tensor of shape [batch_size x cell.state_size]."""
既然有了embedding_rnn_decoder,那么对应的就有embedding_rnn_seq2seq。之前讲过basic_rnn_seq2seq(encoder_inputs, decoder_inputs, cell, dtype=dtypes.float32, scope=None)inputs
:还是像之前说的,既然embedding是内部帮我们完成,则inputs shape= a list of [batch_size],每个位置都只是一个token id。内部使用一个embedding wrapper,做lookup,生成a list of [batch_size, embedding_size]
对比之下,多了几个参数:num_encoder_symbols
:通俗的说其实就是encoder端的vocab_size。enc和dec两端词汇量不同主要在于不同语言的translate task中,如果单纯是中文到中文的生成,不存在两端词汇量的不同。num_decoder_symbols
:同上embedding_size
:每个vocab需要用多少维的vector表示output_projection=None
:feed_previous=False
:如果feed_previous只是简单的一个True or False,则直接返回embedding_rnn_decoder的结果。重点是feed_previous还能传入一个boolean tensor(暂时无此需求)
attention_decoder
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758
def attention_decoder(decoder_inputs,initial_state,attention_states,cell,output_size=None,num_heads=1,loop_function=None,dtype=None,scope=None,initial_state_attention=False):"""RNN decoder with attention for the sequence-to-sequence model.In this context "attention" means that, during decoding, the RNN can look upinformation in the additional tensor attention_states, and it does this byfocusing on a few entries from the tensor. This model has proven to yieldespecially good results in a number of sequence-to-sequence tasks. Thisimplementation is based on http://arxiv.org/abs/1412.7449 (see below fordetails). It is recommended for complex sequence-to-sequence tasks.Args:decoder_inputs: A list of 2D Tensors [batch_size x input_size].initial_state: 2D Tensor [batch_size x cell.state_size].attention_states: 3D Tensor [batch_size x attn_length x attn_size].cell: core_rnn_cell.RNNCell defining the cell function and size.output_size: Size of the output vectors; if None, we use cell.output_size.num_heads: Number of attention heads that read from attention_states.loop_function: If not None, this function will be applied to i-th outputin order to generate i+1-th input, and decoder_inputs will be ignored,except for the first element ("GO" symbol). This can be used for decoding,but also for training to emulate http://arxiv.org/abs/1506.03099.Signature -- loop_function(prev, i) = next* prev is a 2D Tensor of shape [batch_size x output_size],* i is an integer, the step number (when advanced control is needed),* next is a 2D Tensor of shape [batch_size x input_size].dtype: The dtype to use for the RNN initial state (default: tf.float32).scope: VariableScope for the created subgraph; default: "attention_decoder".initial_state_attention: If False (default), initial attentions are zero.If True, initialize the attentions from the initial state and attentionstates -- useful when we wish to resume decoding from a previouslystored decoder state and attention states.Returns:A tuple of the form (outputs, state), where:outputs: A list of the same length as decoder_inputs of 2D Tensors ofshape [batch_size x output_size]. These represent the generated outputs.Output i is computed from input i (which is either the i-th elementof decoder_inputs or loop_function(output {i-1}, i)) as follows.First, we run the cell on a combination of the input and previousattention masks:cell_output, new_state = cell(linear(input, prev_attn), prev_state).Then, we calculate new attention masks:new_attn = softmax(V^T * tanh(W * attention_states + U * new_state))and then we calculate the output:output = linear(cell_output, new_attn).state: The state of each decoder cell the final time-step.It is a 2D Tensor of shape [batch_size x cell.state_size].Raises:ValueError: when num_heads is not positive, there are no inputs, shapesof attention_states are not set, or input size cannot be inferredfrom the input."""
刚才讲完了embedding_rnn_decoder,则再来看看attention_decoder。
和基本的rnn_decoder相比(rnn_decoder(decoder_inputs, initial_state, cell, loop_function=None, scope=None))
多了几个参数:attention_states
:attention_states作为addition info出现,output_size=None
:如果是None的话默认为cell.output_sizenum_heads=1
:应该pay attention的点的个数,比如要focus到attention_states的几个点,默认为只关注1个点initial_state_attention=False
:如果是True的话,attention由state和attention_states进行初始化,如果False,则attention初始化为0
embedding_attention_decoder
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
def embedding_attention_decoder(decoder_inputs,initial_state,attention_states,cell,num_symbols,embedding_size,num_heads=1,output_size=None,output_projection=None,feed_previous=False,update_embedding_for_previous=True,dtype=None,scope=None,initial_state_attention=False):"""RNN decoder with embedding and attention and a pure-decoding option.Args:decoder_inputs: A list of 1D batch-sized int32 Tensors (decoder inputs).initial_state: 2D Tensor [batch_size x cell.state_size].attention_states: 3D Tensor [batch_size x attn_length x attn_size].cell: core_rnn_cell.RNNCell defining the cell function.num_symbols: Integer, how many symbols come into the embedding.embedding_size: Integer, the length of the embedding vector for each symbol.num_heads: Number of attention heads that read from attention_states.output_size: Size of the output vectors; if None, use output_size.output_projection: None or a pair (W, B) of output projection weights andbiases; W has shape [output_size x num_symbols] and B has shape[num_symbols]; if provided and feed_previous=True, each fed previousoutput will first be multiplied by W and added B.feed_previous: Boolean; if True, only the first of decoder_inputs will beused (the "GO" symbol), and all other decoder inputs will be generated by:next = embedding_lookup(embedding, argmax(previous_output)),In effect, this implements a greedy decoder. It can also be usedduring training to emulate http://arxiv.org/abs/1506.03099.If False, decoder_inputs are used as given (the standard decoder case).update_embedding_for_previous: Boolean; if False and feed_previous=True,only the embedding for the first symbol of decoder_inputs (the "GO"symbol) will be updated by back propagation. Embeddings for the symbolsgenerated from the decoder itself remain unchanged. This parameter hasno effect if feed_previous=False.dtype: The dtype to use for the RNN initial states (default: tf.float32).scope: VariableScope for the created subgraph; defaults to"embedding_attention_decoder".initial_state_attention: If False (default), initial attentions are zero.If True, initialize the attentions from the initial state and attentionstates -- useful when we wish to resume decoding from a previouslystored decoder state and attention states.Returns:A tuple of the form (outputs, state), where:outputs: A list of the same length as decoder_inputs of 2D Tensors withshape [batch_size x output_size] containing the generated outputs.state: The state of each decoder cell at the final time-step.It is a 2D Tensor of shape [batch_size x cell.state_size].Raises:ValueError: When output_projection has the wrong shape."""
其实是前面讲的embedding_decoder和attention_decoder的结合版。
embedding_attention_seq2seq
123456789101112
def embedding_attention_seq2seq(encoder_inputs,decoder_inputs,cell,num_encoder_symbols,num_decoder_symbols,embedding_size,num_heads=1,output_projection=None,feed_previous=False,dtype=None,scope=None,initial_state_attention=False)
与embedding_attention_decoder相对应的seq2seq模型
sequence_loss_by_example
1234567891011121314151617181920212223242526272829303132333435363738394041424344
def sequence_loss_by_example(logits,targets,weights,average_across_timesteps=True,softmax_loss_function=None,name=None):"""Weighted cross-entropy loss for a sequence of logits (per example).Args:logits: List of 2D Tensors of shape [batch_size x num_decoder_symbols].targets: List of 1D batch-sized int32 Tensors of the same length as logits.weights: List of 1D batch-sized float-Tensors of the same length as logits.average_across_timesteps: If set, divide the returned cost by the totallabel weight.softmax_loss_function: Function (labels-batch, inputs-batch) -> loss-batchto be used instead of the standard softmax (the default if this is None).name: Optional name for this operation, default: "sequence_loss_by_example".Returns:1D batch-sized float Tensor: The log-perplexity for each sequence.Raises:ValueError: If len(logits) is different from len(targets) or len(weights)."""if len(targets) != len(logits) or len(weights) != len(logits):raise ValueError("Lengths of logits, weights, and targets must be the same ""%d, %d, %d." % (len(logits), len(weights), len(targets)))with ops.name_scope(name, "sequence_loss_by_example",logits + targets + weights):log_perp_list = []for logit, target, weight in zip(logits, targets, weights):if softmax_loss_function is None:# TODO(irving,ebrevdo): This reshape is needed because# sequence_loss_by_example is called with scalars sometimes, which# violates our general scalar strictness policy.target = array_ops.reshape(target, [-1])crossent = nn_ops.sparse_softmax_cross_entropy_with_logits(labels=target, logits=logit)else:crossent = softmax_loss_function(target, logit)log_perp_list.append(crossent * weight)log_perps = math_ops.add_n(log_perp_list)if average_across_timesteps:total_size = math_ops.add_n(weights)total_size += 1e-12 # Just to avoid division by 0 for all-0 weights.log_perps /= total_sizereturn log_perps
返回值:1D batch-sized float Tensor
:为每一个序列(一个batch中有batch_size个sequence)计算其log perplexity,也是名称中by_example的含义
输入:
(注意:一个batch上的所有数据都被pad成相同长度?因此它们的time_length是一样的?)logits
:a list依次存储一系列时刻上的输出,每一时刻的输出都是batch_size为单位的,其中的每一个输入对应的输出是整个vocab上的得分
,因此是num_decoder_symbols。因此,logits应该是a list of [batch_size, num_decoder_symbols]targets
:a list表示依次的所有时刻的target,每一时刻又有batch_size个输入,因此对应batch_size个target,因此shape=a list of [batch_size, ]weights
:每个example,在每一时刻都有对自身当前token的权重。因此shape=a list of [batch_size,]
疑问:weights是做什么用的?为什么要对每个token设置权重?
解读代码:
首先会生成一个crossent
,shape=[batch_size, ],再和weights相乘,还是得到[batch_size, ],表示每个example在当前时刻t位置的得分(batch_size个),append到log_perp_list中(最终shape是a list of [batch_size, ])
所有的time length循环完毕之后,累加这些time length,得到一个shape=[batch_size,]的变量,叫做log_perps。
sequence_loss
123456789101112131415161718192021222324252627282930313233343536
def sequence_loss(logits,targets,weights,average_across_timesteps=True,average_across_batch=True,softmax_loss_function=None,name=None):"""Weighted cross-entropy loss for a sequence of logits, batch-collapsed.Args:logits: List of 2D Tensors of shape [batch_size x num_decoder_symbols].targets: List of 1D batch-sized int32 Tensors of the same length as logits.weights: List of 1D batch-sized float-Tensors of the same length as logits.average_across_timesteps: If set, divide the returned cost by the totallabel weight.average_across_batch: If set, divide the returned cost by the batch size.softmax_loss_function: Function (labels-batch, inputs-batch) -> loss-batchto be used instead of the standard softmax (the default if this is None).name: Optional name for this operation, defaults to "sequence_loss".Returns:A scalar float Tensor: The average log-perplexity per symbol (weighted).Raises:ValueError: If len(logits) is different from len(targets) or len(weights)."""with ops.name_scope(name, "sequence_loss", logits + targets + weights):cost = math_ops.reduce_sum(sequence_loss_by_example(logits,targets,weights,average_across_timesteps=average_across_timesteps,softmax_loss_function=softmax_loss_function))if average_across_batch:batch_size = array_ops.shape(targets[0])[0]return cost / math_ops.cast(batch_size, cost.dtype)else:return cost
其实主体还是上面讲的sequence_loss_by_example
,只不过对上面的[batch_size,]的结果进行sum,如果默认average_across_batch的话,就sum/batch_size,平均每一个sequence的log perplexity;要是设置了不平均,则返回的是整个batch上的sum of log perplexity
model_with_buckets
1234567891011121314151617181920212223242526272829303132333435363738394041
def model_with_buckets(encoder_inputs,decoder_inputs,targets,weights,buckets,seq2seq,softmax_loss_function=None,per_example_loss=False,name=None):"""Create a sequence-to-sequence model with support for bucketing.The seq2seq argument is a function that defines a sequence-to-sequence model,e.g., seq2seq = lambda x, y: basic_rnn_seq2seq(x, y, core_rnn_cell.GRUCell(24))Args:encoder_inputs: A list of Tensors to feed the encoder; first seq2seq input.decoder_inputs: A list of Tensors to feed the decoder; second seq2seq input.targets: A list of 1D batch-sized int32 Tensors (desired output sequence).weights: List of 1D batch-sized float-Tensors to weight the targets.buckets: A list of pairs of (input size, output size) for each bucket.seq2seq: A sequence-to-sequence model function; it takes 2 input thatagree with encoder_inputs and decoder_inputs, and returns a pairconsisting of outputs and states (as, e.g., basic_rnn_seq2seq).softmax_loss_function: Function (labels-batch, inputs-batch) -> loss-batchto be used instead of the standard softmax (the default if this is None).per_example_loss: Boolean. If set, the returned loss will be a batch-sizedtensor of losses for each sequence in the batch. If unset, it will bea scalar with the averaged loss from all examples.name: Optional name for this operation, defaults to "model_with_buckets".Returns:A tuple of the form (outputs, losses), where:outputs: The outputs for each bucket. Its j'th element consists of a listof 2D Tensors. The shape of output tensors can be either[batch_size x output_size] or [batch_size x num_decoder_symbols]depending on the seq2seq model used.losses: List of scalar Tensors, representing losses for each bucket, or,if per_example_loss is set, a list of 1D batch-sized float Tensors.Raises:ValueError: If length of encoder_inputs, targets, or weights is smallerthan the largest (last) bucket."""
参数:encoder_inputs
:一开始我有个疑问,这里的inputs是ids的形式还是传入input_size的形式,仔细想想实际是这样的。这个inputs具体的shape形式要根据后面seq2seq定义的那个函数决定,一般就只传入两个参数x, y分别对应encoder_inputs和decoder_inputs(另外特定seq2seq需要的参数需要在自定义的这个seq2seq函数内部传入)。这个时候,如果我们使用的是embedding_seq2seq,那么实际的inputs就应该是ids的样子;否则,就是input_size的样子。targets
:a list因为每一时刻都会有target,并且每一时刻输入的是batch_size个,因此每一时刻的target是[batch_size,]的形式,最终导致targets是a list of [batch_size, ]buckets
:a list of (input_size, output_size)per_example_loss
:默认是False,表示losses是[batch_size, ]。比如刚才讲到的sequence_loss_by_example的结果是[batch_size,],再者sequence_loss的结果是一个scalar。
实现:
123456
for j, bucket in enumerate(buckets):with variable_scope.variable_scope(variable_scope.get_variable_scope(), reuse=True if j > 0 else None):bucket_outputs, _ = seq2seq(encoder_inputs[:bucket[0]],decoder_inputs[:bucket[1]])outputs.append(bucket_outputs)
根据实现可以看到,比如设置了3个buckets=[(2, 4), (5, 7), (8, 10)],第1个bucket是(2,4),那么先截取encoder_inputs中每个(batch_size个)sequences的前2个tokens,和同理截取decoder_inputs中前4个tokens(encoder_inputs的第一维度就是time)。
然后把截取部分进行seq2seq,得到输出是a list of [batch_size, output_size](这个list的长度为4,output是按decoder的长度算),然后将这个输出加入到outputs中。
最终得到的outputs就是一个bucket_size长度(这里为3)的列表,列表中每个元素是长度不等的list(之所以长度不等是因为每个bucket所定义的max_decoder_length不等,依次增大)
1234567891011121314
if per_example_loss:losses.append(sequence_loss_by_example(outputs[-1],targets[:bucket[1]],weights[:bucket[1]],softmax_loss_function=softmax_loss_function))else:losses.append(sequence_loss(outputs[-1],targets[:bucket[1]],weights[:bucket[1]],softmax_loss_function=softmax_loss_function))
计算完当前bucket的outputs后,就应该计算当前bucket的loss。由于当前bucket的output刚刚append,因此outputs[-1]就是当前bucket的output。又因为我们截取了decoder_inputs,因此targets和weights都要截取成相同的长度。这样的话就得到当前bucket的loss,append到losses中。
因此,最后的outputs和losses,我们只要索引bucket的idx,就可以得到该bucket上的output和loss。
- tensorflow的legacy_seq2seq模块
- TensorFlow入门--安装常用的算法模块
- tensorflow可视化模块 tensorboard
- TensorFlow saved_model 模块
- python的matplotlib模块实现tensorflow结果可视化
- 使用tensorflow中没有的模块(matplotlib),怎么办
- TensorFlow(五)TensorFlow Python API (nn模块)
- Tensorflow contrib.layers 模块介绍
- TensorFlow-tf.gfile()模块
- Tensorflow tf.nn模块分析
- tensorflow代码实现:Inception模块
- windows上安装tensorflow时报错,“DLL load failed: 找不到指定的模块”的解决方式
- 用tensorflow的slim模块快速实现mnist手写体识别分类
- import keras中Using TensorFlow backend出现ImportError: DLL load failed: 找不到指定的模块。
- win7/python3.5/tensorflow ImportError: DLL load failed: 找不到指定的模块
- 【TensorFlow】TensorFlow的线性回归
- 【TensorFlow】TensorFlow 的 Logistic Regression
- TensorFlow 完整的TensorFlow入门教程
- 九种原生js动画效果
- 第九天H5进阶
- 树状数组
- 字符串相关
- Oracle 12C 新特性之 db默认字符集AL32UTF8、PDB支持不同字符集
- tensorflow的legacy_seq2seq模块
- Android客户端和Servlet服务器端通过JSON交互
- bootstrap页头
- Avast 导致 VMware 虚拟机无法上网解决方法
- 51nod 1024 矩阵中不重复的元素
- 数据开发常用的几种数据预处理和数据整理方法
- Android错误java.net.ConnectException: localhost/127.0.0.1:8080
- 求平方根的三种方法
- DICOM简介