tensorflow RNN实例

来源：互联网发布：数据精灵编辑：程序博客网时间：2024/05/16 08:04

本实例基于谷歌tensorflow官网RNN tutorial，Basic LSTM，侧重代码分析，包括数据预处理。

read.py

_read_words函数

读取ptb文件，按utf-8格式读入，换行符使用替换，读取到的将组成list,可以通过如下的命令行模式下进行测试。

with tf.gfile.GFile("/home/gsc/envtensorflow/deep_learn/models/tutorials/simple-examples/data/ptb.train.txt", "r") as f:  data =f.read().decode("utf-8").replace("\n", "<eos>").split()

这里写图片描述
图ptb_1

_build_vocab函数

统计每个单词出现的次数，counter是字典格式，key是单词，value是该单词出现的次数

counter = collections.Counter(data)

得到一个list，list的每个元素是一个元组，list中单词出现的次数是降序排序过的。如图ptb_2

count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))

这里写图片描述
图ptb_2
将单词和出现的次数分开存放在words和中。这种表示一般就是不用。

words, _ = list(zip(*count_pairs))

将单词和其顺序编码后，以字典的形式存在word_to_id中，顺序从0开始，见ptb_3

word_to_id = dict(zip(words, range(len(words))))

这里写图片描述
图ptb_3
最后返回单词序列和其对应的值如{'the'， 0}, {'<unk>',1}, ..., {'federal', 100}

train_data = _file_to_word_ids(train_path, word_to_id)

这里把这个函数展开成如下：

train_data = []for word in data:  if word in word_to_id:    train_data.append(word_to_id[word])

train_data 存放的就是每一个每一个单词对应在word_to_id的索引值，比如aer在word_to_id中的索引值是9970，
则train_data的第一个元素就是train_data[0]=9970…
获得总的单词的个数

vocabulary = len(word_to_id)

ptb_producer函数：

参数raw_data：train_data, batch_size:20, num_steps:20,以small方式进行分析。
将numpy的array转换成tensorflow需要的tensor，见ptb_5
这里写图片描述
ptb_5

raw_data = tf.convert_to_tensor(raw_data, name="raw_data", dtype=tf.int32)

把原始一位数组的数据，转换成20行，batch_len的列，batch_len 是数据总长度除以batch_size的值。

    data = tf.reshape(raw_data[0 : batch_size * batch_len],                      [batch_size, batch_len])

这里创建的i类似于c语言中中的for循环中i的作用。shuffle表示不要重拍，i的值就是从0-epoch_size-1.

i = tf.train.range_input_producer(epoch_size, shuffle=False).dequeue()

将data数据进行了维度切分，分成了[batch_size-0, (i+1)*num_steps - i *num_steps]= [20, 20],实际上是对data数据按照20列的次序进行切割。

    x = tf.strided_slice(data, [0, i * num_steps],                         [batch_size, (i + 1) * num_steps])

y和x基本上是一样的，即相当于是y[n]=x[n-1]，即把对应的元素后移一个单位，这是可以理解的，x做为训练输入数据，y做为label，label的标准就是判断其下一个输出。

ptb_word_lm.py

获得配置

根据

config = get_config()

根据配置的模式，获取配置参数，这里假设Small模式

|  """Small config."""|  init_scale = 0.1|  learning_rate = 1.0|  max_grad_norm = 5|  num_layers = 2|  num_steps = 20|  hidden_size = 200|  max_epoch = 4|  max_max_epoch = 13|  keep_prob = 1.0|  lr_decay = 0.5|  batch_size = 20|  vocab_size = 10000

PTBInput类：
self.input_data
self.targets
将都是[20, 20]的tensor。见ptb_6
这里写图片描述
图ptb_6

train_input = PTBInput(config=config, data=train_data, name="TrainInput")

train_input就是一个class的实例，后面在使用使，需要使用诸如：

train_input.targetstrain_input.input_data

class PTBModel(object):训练模型核心

basic Cell

这是继数据预处理之后的另一个核心模块。
从2017年3月17号起（tensorflow1.0之前），tf.contrib.rnn.BasicLSTMCell的参数中，并没有reuse参数。这里是兼容新旧两种版本。

      if 'reuse' in inspect.getargspec(          tf.contrib.rnn.BasicLSTMCell.__init__).args:        return tf.contrib.rnn.BasicLSTMCell(            size, forget_bias=0.0, state_is_tuple=True,            reuse=tf.get_variable_scope().reuse)      else:        return tf.contrib.rnn.BasicLSTMCell(            size, forget_bias=0.0, state_is_tuple=True)

创建的BasicLSTMCell放在了attn_cell里，或者说attn_cell是一个实例。

attn_cell = lstm_cell

这里实现的LSTM，是最基本的LSTM，其论文在http://arxiv.org/abs/1409.2329，这里直接粘贴公式：
这里写图片描述
图ptb_8

图ptb_9 有dropout
看结构：

图ptb_10
为了便于看ptb_10中的tensor和维度关系，这里需要对公式重新按上图罗列一下（和前一篇有些重复了，参看http://blog.csdn.net/shichaog/article/details/72853665 ）：

hjt=ojt⊙tanh(cjt)
cjt=fjt⊙cjt−1+ijt⊙jjt
ojt=σ(WxoXt+Whoht−1+bo)j
fjt=tanh(Wxfxt+Whfht−1+bf)j
jjt=σ(Wxjxt+Whjht−1+bj)j
ijt=σ(Wxixt+Whiht−1+bi)j
这里我要对上面的公式进行重组一下。把权重和输入组合成一个大矩阵。
[xth(t−1)][WxiWxjWxfWxoWhiWhjWhfWho]
将输入也进行重组
那么就有如下的重组计算公式：
[i′tj′tf′to′t]=[Xt(2∗200)ht−1(2∗200)]⊙[Wxi(400∗100)Wxj(400∗100)Wxf(400∗100)Wxo(400∗100)Whi(400∗100)Whj(400∗100)Whf(400∗100)Who(400∗100)]
上面的推导，就是basic_lstm_cell_1中的数据维度的关系。相乘后矩阵在进行split，得到20*200的维度矩阵，再分别basic_lstm_cell中做σtanh操作，这些计算就是basic LSTM中给定的操作。
将tensorboard打开后，可以看到如下的具体细节：
这里写图片描述
图ptb_11
split的上一层各node的连线从左到右一次对应于ft,it,jt,ot上的连线。看图ptb_10，it和jt相乘得到mul1节点，mul是ft和ct−1节点的乘积。后面把ct和ht这两个tensor传递给mlti_rnn_cell_1，把ht和输入xt传递给basic_lstm_cell_1.
总结来说就是把multi_rnn_cell的cell_0的ct和ht传递给multi_rnn_cell_1的cell_0的ct和ht;把multi_rnn_cell的cell_1的ct和ht传递给multi_rnn_cell_1的cell_1的ct和ht;
由于config.num_layers的值等于2，所以创建了循环两次。

    cell = tf.contrib.rnn.MultiRNNCell(        [attn_cell() for _ in range(config.num_layers)], state_is_tuple=True)    self._initial_state = cell.zero_state(batch_size, data_type())得到的结构如下ptb_12

这里写图片描述
ptb_12

    with tf.device("/cpu:0"):      embedding = tf.get_variable(          "embedding", [vocab_size, size], dtype=data_type())      inputs = tf.nn.embedding_lookup(embedding, input_.input_data)

见ptb_13&ptb_14
这里写图片描述
ptb_13

ptb_14

RNN堆叠

接下来创建了一个RNN的变量空间。

    with tf.variable_scope("RNN"):      for time_step in range(num_steps):        if time_step > 0: tf.get_variable_scope().reuse_variables()        (cell_output, state) = cell(inputs[:, time_step, :], state)        outputs.append(cell_output)

这里写图片描述
ptb_15
这里根据time_step的值，共进行了20次，这里ptb_15只截屏到了几个，为了让细节看的更清楚，每个multi_rnn_cell之间都有四个tensor，这四个tensor分别是cell0和cell1的ct和ht。最后总共输出20个tensor。每一个tensor都是20*200维度的。
这里写图片描述
图ptb_16
然后经过stack和reshape操作，得到400*200的矩阵。

    output = tf.reshape(tf.stack(axis=1, values=outputs), [-1, size])

接下来定义了权重和bias

    softmax_w = tf.get_variable(        "softmax_w", [size, vocab_size], dtype=data_type())    softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type())

这里写图片描述
ptb_17
它们的维度如上图。
至此，可以看看embedding,RNN，w和b的关系。

ptb_18

损失函数

    logits = tf.matmul(output, softmax_w) + softmax_b    loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example(        [logits],        [tf.reshape(input_.targets, [-1])],        [tf.ones([batch_size * num_steps], dtype=data_type())])    self._cost = cost = tf.reduce_sum(loss) / batch_size    self._final_state = state

seq2seq模型，这里不解释，放到seq2seq。

learning rate跟新

    self._lr = tf.Variable(0.0, trainable=False)    tvars = tf.trainable_variables()    grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars),                                      config.max_grad_norm)

为了处理gradient explosion和gradient vanishing，使用clip方式，将梯度限制在合理范围。

训练过程

for i in range(config.max_max_epoch):

根据重复13（max_max_epoch）次遍历所有训练数据。
初始化模型的学习率。

lr_decay = config.lr_decay ** max(i + 1 - config.max_epoch, 0.0)        m.assign_lr(session, config.learning_rate * lr_decay)

        print("Epoch: %d Learning rate: %.3f" % (i + 1, session.run(m.lr)))        train_perplexity = run_epoch(session, m, eval_op=m.train_op,                                     verbose=True)

run_epoch

这个函数首先初始化模型的初始化状态。

  state = session.run(model.initial_state)

然后将模型的cost和state存到fetches字典里。

fetches={                                                                   return self._input                                                         "cost": model.cost,                                                                                                                            "final_state": model.final_state,                               @property                                                                  }    if eval_op is not None:

训练过程

//epoch_size==13 ,所以这里执行了13次  for step in range(model.input.epoch_size):    feed_dict = {}    //每一次都获取LSTM的状态，$c_t$和$h_t$,并把新的状态放到填充字典中。enumerate是内置遍历函数，i变成从0开始增加的非负整数，(c,h)是state组成的元组    for i, (c, h) in enumerate(model.initial_state):      feed_dict[c] = state[i].c      feed_dict[h] = state[i].h//启动计算图，获得$model.cost$和$model.final_state$节点计算值    vals = session.run(fetches, feed_dict)这里取出cost，和state是为了计算perplexity值，perplexity值可以看成是备选词的数量，所以该值越小越好。    cost = vals["cost"]    state = vals["final_state"]    costs += cost    iters += model.input.num_steps

付录，BasicLSTM实现源码

这个不难，看ptb_10和ptb_11就可以明白，这里不具体分析代码

class BasicLSTMCell(RNNCell):  """Basic LSTM recurrent network cell.  The implementation is based on: http://arxiv.org/abs/1409.2329.  We add forget_bias (default: 1) to the biases of the forget gate in order to  reduce the scale of forgetting in the beginning of the training.  It does not allow cell clipping, a projection layer, and does not  use peep-hole connections: it is the basic baseline.  For advanced models, please use the full LSTMCell that follows.  """  def __init__(self, num_units, forget_bias=1.0, input_size=None,               state_is_tuple=True, activation=tanh, reuse=None):    """Initialize the basic LSTM cell.    Args:      num_units: int, The number of units in the LSTM cell.      forget_bias: float, The bias added to forget gates (see above).      input_size: Deprecated and unused.      state_is_tuple: If True, accepted and returned states are 2-tuples of        the `c_state` and `m_state`.  If False, they are concatenated        along the column axis.  The latter behavior will soon be deprecated.      activation: Activation function of the inner states.      reuse: (optional) Python boolean describing whether to reuse variables        in an existing scope.  If not `True`, and the existing scope already has        the given variables, an error is raised.    """    if not state_is_tuple:      logging.warn("%s: Using a concatenated state is slower and will soon be "                   "deprecated.  Use state_is_tuple=True.", self)    if input_size is not None:      logging.warn("%s: The input_size parameter is deprecated.", self)    self._num_units = num_units    self._forget_bias = forget_bias    self._state_is_tuple = state_is_tuple    self._activation = activation    self._reuse = reuse  @property  def state_size(self):    return (LSTMStateTuple(self._num_units, self._num_units)            if self._state_is_tuple else 2 * self._num_units)  @property  def output_size(self):    return self._num_units  def __call__(self, inputs, state, scope=None):    """Long short-term memory cell (LSTM)."""    with _checked_scope(self, scope or "basic_lstm_cell", reuse=self._reuse):      # Parameters of gates are concatenated into one multiply for efficiency.      if self._state_is_tuple:        c, h = state      else:        c, h = array_ops.split(value=state, num_or_size_splits=2, axis=1)      concat = _linear([inputs, h], 4 * self._num_units, True)      # i = input_gate, j = new_input, f = forget_gate, o = output_gate      i, j, f, o = array_ops.split(value=concat, num_or_size_splits=4, axis=1)      new_c = (c * sigmoid(f + self._forget_bias) + sigmoid(i) *               self._activation(j))      new_h = self._activation(new_c) * sigmoid(o)      if self._state_is_tuple:        new_state = LSTMStateTuple(new_c, new_h)      else:        new_state = array_ops.concat([new_c, new_h], 1)return new_h, new_state

阅读全文

0 0