seq2seq里的 attention机制的原理及代码及个人理解

来源：互联网发布：数据存储四种方式编辑：程序博客网时间：2024/06/03 01:42

其中

其中

其中

综合

观察所有输入的东西，可见是所有encoder的输出和 decoder的每个state 一起作为输入，搅和在一起，然后target/output就是一个类似score的东西

def attention(self, prev_state, enc_outputs):        """        Attention model for Neural Machine Translation        :param prev_state: the decoder hidden state at time i-1        :param enc_outputs: the encoder outputs, a length 'T' list.        """        e_i = []        c_i = []        for output in enc_outputs:            atten_hidden = tf.tanh(tf.add(tf.matmul(prev_state, self.attention_W), tf.matmul(output, self.attention_U)))            e_i_j = tf.matmul(atten_hidden, self.attention_V)            e_i.append(e_i_j)        e_i = tf.concat(e_i, axis=1)        alpha_i = tf.nn.softmax(e_i)        alpha_i = tf.split(alpha_i, self.num_steps, 1)        for alpha_i_j, output in zip(alpha_i, enc_outputs):            c_i_j = tf.multiply(alpha_i_j, output)            c_i.append(c_i_j)        c_i = tf.reshape(tf.concat(c_i, axis=1), [-1, self.num_steps, self.hidden_dim * 2])        c_i = tf.reduce_sum(c_i, 1)        return c_i#对应的decode    def decode(self, cell, init_state, encoder_outputs, loop_function=None):        outputs = []        prev = None        state = init_state        for i, inp in enumerate(self.decoder_inputs_emb):#decoder_inputs_emb是tf.placeholder           #if loop_function is not None and prev is not None:           #    with tf.variable_scope("loop_function", reuse=True):           #        inp = loop_function(prev, i)           #if i > 0:           #    tf.get_variable_scope().reuse_variables()            c_i = self.attention(state, encoder_outputs)            inp = tf.concat([inp, c_i], axis=1)            output, state = cell(inp, state)#原本没有attention的是decoder_input和state作为输入            outputs.append(output)            if loop_function is not None:                prev = output        return outputs

代码摘自 https://github.com/pemywei/attention-nmt

阅读全文

0 0

seq2seq里的 attention机制 的 原理 及 代码 及 个人理解

seq2seq里的 attention机制的原理及代码及个人理解