使用theano写mini-batch训练的BiLSTM

来源：互联网发布：山海经图鉴软件编辑：程序博客网时间：2024/06/06 18:05

最近在做毕设，数据量有点大，所以不得不使用mini-batch的训练方式，咨询了同学，mini-batch的原理好像比较简单，就是取batch里句子最长的长度，然后加一个标记每个句子长度的mask矩阵，lstm每次scan的时候把中间的c和h的值过一下mask，废话不多说了，下面看代码：
首先我初始化了模型需要的参数：

W_value = numpy.asarray(rng.uniform(    low = -initialize_range,    high = initialize_range,    size = (2, 4, n_in, n_h)), dtype = theano.config.floatX)W_s = theano.shared(value = W_value, name = "W_s", borrow = True)U_value = numpy.asarray(rng.uniform(    low = -initialize_range,    high = initialize_range,    size = (2, 4, n_h, n_h)), dtype = theano.config.floatX)U_s = theano.shared(value = U_value, name = "U_s", borrow = True)b_value = numpy.asarray(rng.uniform(    low = -initialize_range,    high = initialize_range,    size = (2, 4, n_h)), dtype = theano.config.floatX)b_s = theano.shared(value = b_value, name = "b_s", borrow = True)v_o_value = numpy.asarray(rng.uniform(    low=-initialize_range,    high=initialize_range,    size=(2, n_h, n_h)), dtype=theano.config.floatX)vo = theano.shared(value = v_o_value, name='v_o', borrow=True)

然后这是普通BiLSTM的代码：

def _step(x, c_, h_, W, U, b, vo):    i = T.nnet.sigmoid(T.dot(x, W[0]) + T.dot(h_, U[0]) + b[0])    f = T.nnet.sigmoid(T.dot(x, W[1]) + T.dot(h_, U[1]) + b[1])    c = i * (T.tanh(T.dot(x, W[2]) + T.dot(h_, U[2]) + b[2])) + f * c_    o = T.nnet.sigmoid(T.dot(x, W[3]) + T.dot(h_, U[3]) + T.dot(c, vo)  + b[3])    h = o * T.tanh(c)[c_l, h_l], _ = theano.scan(fn = _step,    sequences = seq_x,    outputs_info = [tensor.alloc(numpy.asarray(0., dtype=theano.config.floatX), n_h), tensor.alloc(numpy.asarray(0., dtype=theano.config.floatX), n_h)],    non_sequences = [self.W_s[0], self.U_s[0], self.b_s[0], self.vo[0]],    n_steps = seq_x.shape[0])[c_r, h_r], _ = theano.scan(fn = _step,     sequences = seq_x[::-1],    outputs_info = [tensor.alloc(numpy.asarray(0., dtype=theano.config.floatX), n_h), tensor.alloc(numpy.asarray(0., dtype=theano.config.floatX), n_h)],    non_sequences = [self.W_s[1], self.U_s[1], self.b_s[1], self.vo[1]],    n_steps = seq_x.shape[0])

其中seq_x是输入数据，维度是（sentence_length, feature_length），同时每次的c0和h0都设为0（当然可以随机初始化，加入到params中，后续跟着其他参数一起训练），_step会对每个单词执行；
那么下面的是mini-batch的BiLSTM代码：

def _step(m_, x, c_, h_, W, U, b, vo):    i = T.nnet.sigmoid(T.dot(x, W[0]) + T.dot(h_, U[0]) + b[0])    f = T.nnet.sigmoid(T.dot(x, W[1]) + T.dot(h_, U[1]) + b[1])    c = i * (T.tanh(T.dot(x, W[2]) + T.dot(h_, U[2]) + b[2])) + f * c_    ct = m_[:, None] * c + (1 - m_)[:, None] * c_    o = T.nnet.sigmoid(T.dot(x, W[3]) + T.dot(h_, U[3]) + T.dot(c, vo)  + b[3])    h = o * T.tanh(c)    ht = m_[:, None] * h + (1 - m_)[:, None] * h_    return [ct, ht][c_l, h_l], updates_l = theano.scan(fn = _step,     sequences = [mask, seq_x],     outputs_info = [tensor.alloc(numpy.asarray(0., dtype=theano.config.floatX), batch_size ,n_h), tensor.alloc(numpy.asarray(0., dtype=theano.config.floatX), batch_size, n_h)],    non_sequences = [self.W_s[0], self.U_s[0], self.b_s[0], self.vo[0]],    n_steps = seq_x.shape[0])[c_r, h_r], updates_r = theano.scan(fn = _step,    sequences = [mask[::-1], seq_x[::-1]],    outputs_info = [tensor.alloc(numpy.asarray(0., dtype=theano.config.floatX), batch_size ,n_h), tensor.alloc(numpy.asarray(0., dtype=theano.config.floatX), batch_size, n_h)],    non_sequences = [self.W_s[1], self.U_s[1], self.b_s[1], self.vo[1]],    n_steps = seq_x.shape[0])

由于函数_step需要预先知道句子的长度，而如果想要将不同长度的句子一起（batch）执行_step函数，我们需要将不同长度的句子拼成一样长。实际的做法是选出这个batch里最长的句子，然后将其他短的句子也拼成最长句子这样的长度，同时用一个矩阵mask来标记每个句子的哪些位置是被补上的（补上的位置在scan执行的时候不应该进行计算）。在实际的代码中可以看到，_step多了m_这个变量，这个变量是mask矩阵，维度是（max_length, batch_size），由0和1组成，1表示这个句子的这个位置是有单词的，0表示这个句子的这个位置是有单词的，它的作用是在每次的step中 “ ct = m[:, None] * c + (1 - m_)[:, None] * c_ ” 和 “ ht = m_[:, None] * h + (1 - m_)[:, None] * h_ ” 就是对每次的值进行过滤，如果当前m_是1，该位置的ct就是这次计算出的c，如果m_是0，该位置的ct就是上一次计算出的c_（ht的计算同理）；此时seq_x的维度是（max_length，batch_size，feature_length），而我们前期在获取数据时一般维度是（batch_size，max_length，feature_length），所以在在传入LSTM模型时需要注意将维度进行转换（可以利用dimshuffle）。
最后的一点建议，在利用theano写模型时，最好在主函数里写一个用来测试的function，这个function可以返回需要的变量，通过打印出它的shape以及具体的值可以保证每一步都是正确的，这样做能节省很多时间。
以上。

0 0