Tensorflow实例:实现基于LSTM的语言模型
来源:互联网 发布:java 多线程 挂起 编辑:程序博客网 时间:2024/05/17 09:30
RNN
人每次思考时不会重头开始,而是保留之前思考的一些结果为现在的决策提供支持。例如我们对话时,我们会根据上下文的信息理解一句话的含义,而不是对每一句话重头进行分析。传统的神经网络不能实现这个功能,这可能是其一大缺陷。例如卷积神经网络虽然可以对图像进行分类,但是可能无法对视频中每一帧图像发生的事情进行关联分析,我们无法利用前一帧图像的信息,而循环神经网络则可以解决这个问题。
如上图所示,x是RNN的输入,s是RNN的一个节点,而o是输出。我们对这个RNN输入数据x,然后通过网络计算并得到输出结果o,再将某些信息(state,状态)传入到网络的输入。我们将o与label进行比较可以得到误差,有了这个误差之后,就能使用梯度下降(Gradient Descent)和Back-Propagation Through Time(BPTT)方法对网络进行训练,BPTT与训练前馈神经网络的传统BP方法类似,也是使用反向传播求梯度并更新网络参数权重。另外,还有一种方法叫Real-Time Recurrent Learning(RTRL),它可以正向求解梯度,不过其计算复杂度比较高。
RNN展开后,类似于有一系列输入x和一系列输出o的串联的普通神经网络,上一层的神经网络会传递信息给下一层。这种串联的结构天然就非常适合时间序列数据的处理和分析。需要注意的是,展开后的每一层级的神经网络,其参数都是相同的,我们并不需要训练成百上千层神经网络的参数,只需要训练一层RNN的参数。这就是它结构巧妙的地方,这里共享参数的思想和卷积网络中权值共享的方式也很类似。
LSTM
对于某些简单的问题,可能只需要最后输入的少量时序信息即可解决。但是对某些复杂问题,可能需要更早的一些信息,甚至是时间序列开头的信息,但间隔太远的输入信息,RNN是难以记忆的,因此长程依赖(Long-term Dependencies)是传统RNN的致命伤。
LSTM天生就是为了解决长程依赖而设计的,不需要特别复杂地调试超参数,默认就可以记住长期的信息。
LSTM的内部结构相比RNN更复杂,其中包含了4层神经网络,其中小圈圈是point-wise的操作,比如向量加法、点乘等,而小矩阵则代表一层可学习参数的神经网络。
- LSTM单元上面的那条直线代表了LSTM的状态state,它会贯穿所有串联在一起的LSTM单元,从第一个LSTM单元一直流向最后一个LSTM单元,其中只有少量的线性干预和改变。
- 状态state在这条隧道中传递时,LSTM单元可以对其添加或删除信息,这些对信息流的修改操作由LSTM中的Gates控制。
- 这些Gates中包含了一个Sigmoid层和一个向量点乘的操作,这个Sigmoid层的输出是0-1之间的值,它直接控制了信息传递的比例。
- 每个LSTM单元中包含了3个这样的Gates,用来维护和控制单元的状态信息。凭借对状态信息的存储和修改,LSTM单元就可以实现长程记忆。
Tensorflow实现LSTM
下面我们就使用LSTM来实现一个语言模型,给定上文的语境,即历史出现的单词,语言模型可以预测下一个单词出现的概率,使用的数据集:PTB
#%%# Copyright 2016 The TensorFlow Authors. All Rights Reserved.## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at## http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.# ==============================================================================import timeimport numpy as npimport tensorflow as tfimport reader#flags = tf.flags#logging = tf.logging#flags.DEFINE_string("save_path", None,# "Model output directory.")#flags.DEFINE_bool("use_fp16", False,# "Train using 16-bit floats instead of 32bit floats")#FLAGS = flags.FLAGS#def data_type():# return tf.float16 if FLAGS.use_fp16 else tf.float32class PTBInput(object): """The input data.""" def __init__(self, config, data, name=None): self.batch_size = batch_size = config.batch_size self.num_steps = num_steps = config.num_steps self.epoch_size = ((len(data) // batch_size) - 1) // num_steps self.input_data, self.targets = reader.ptb_producer( data, batch_size, num_steps, name=name)class PTBModel(object): """The PTB model.""" def __init__(self, is_training, config, input_): self._input = input_ batch_size = input_.batch_size num_steps = input_.num_steps size = config.hidden_size vocab_size = config.vocab_size # Slightly better results can be obtained with forget gate biases # initialized to 1 but the hyperparameters of the model would need to be # different than reported in the paper. def lstm_cell(): return tf.contrib.rnn.BasicLSTMCell( size, forget_bias=0.0, state_is_tuple=True) attn_cell = lstm_cell if is_training and config.keep_prob < 1: def attn_cell(): return tf.contrib.rnn.DropoutWrapper( lstm_cell(), output_keep_prob=config.keep_prob) cell = tf.contrib.rnn.MultiRNNCell( [attn_cell() for _ in range(config.num_layers)], state_is_tuple=True) self._initial_state = cell.zero_state(batch_size, tf.float32) with tf.device("/cpu:0"): embedding = tf.get_variable( "embedding", [vocab_size, size], dtype=tf.float32) inputs = tf.nn.embedding_lookup(embedding, input_.input_data) if is_training and config.keep_prob < 1: inputs = tf.nn.dropout(inputs, config.keep_prob) # Simplified version of models/tutorials/rnn/rnn.py's rnn(). # This builds an unrolled LSTM for tutorial purposes only. # In general, use the rnn() or state_saving_rnn() from rnn.py. # # The alternative version of the code below is: # # inputs = tf.unstack(inputs, num=num_steps, axis=1) # outputs, state = tf.nn.rnn(cell, inputs, # initial_state=self._initial_state) outputs = [] state = self._initial_state with tf.variable_scope("RNN"): for time_step in range(num_steps): if time_step > 0: tf.get_variable_scope().reuse_variables() (cell_output, state) = cell(inputs[:, time_step, :], state) outputs.append(cell_output) output = tf.reshape(tf.concat(outputs, 1), [-1, size]) softmax_w = tf.get_variable( "softmax_w", [size, vocab_size], dtype=tf.float32) softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=tf.float32) logits = tf.matmul(output, softmax_w) + softmax_b loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example( [logits], [tf.reshape(input_.targets, [-1])], [tf.ones([batch_size * num_steps], dtype=tf.float32)]) self._cost = cost = tf.reduce_sum(loss) / batch_size self._final_state = state if not is_training: return self._lr = tf.Variable(0.0, trainable=False) tvars = tf.trainable_variables() grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), config.max_grad_norm) optimizer = tf.train.GradientDescentOptimizer(self._lr) self._train_op = optimizer.apply_gradients( zip(grads, tvars), global_step=tf.contrib.framework.get_or_create_global_step()) self._new_lr = tf.placeholder( tf.float32, shape=[], name="new_learning_rate") self._lr_update = tf.assign(self._lr, self._new_lr) def assign_lr(self, session, lr_value): session.run(self._lr_update, feed_dict={self._new_lr: lr_value}) @property def input(self): return self._input @property def initial_state(self): return self._initial_state @property def cost(self): return self._cost @property def final_state(self): return self._final_state @property def lr(self): return self._lr @property def train_op(self): return self._train_opclass SmallConfig(object): """Small config.""" init_scale = 0.1 learning_rate = 1.0 max_grad_norm = 5 num_layers = 2 num_steps = 20 hidden_size = 200 max_epoch = 4 max_max_epoch = 13 keep_prob = 1.0 lr_decay = 0.5 batch_size = 20 vocab_size = 10000class MediumConfig(object): """Medium config.""" init_scale = 0.05 learning_rate = 1.0 max_grad_norm = 5 num_layers = 2 num_steps = 35 hidden_size = 650 max_epoch = 6 max_max_epoch = 39 keep_prob = 0.5 lr_decay = 0.8 batch_size = 20 vocab_size = 10000class LargeConfig(object): """Large config.""" init_scale = 0.04 learning_rate = 1.0 max_grad_norm = 10 num_layers = 2 num_steps = 35 hidden_size = 1500 max_epoch = 14 max_max_epoch = 55 keep_prob = 0.35 lr_decay = 1 / 1.15 batch_size = 20 vocab_size = 10000class TestConfig(object): """Tiny config, for testing.""" init_scale = 0.1 learning_rate = 1.0 max_grad_norm = 1 num_layers = 1 num_steps = 2 hidden_size = 2 max_epoch = 1 max_max_epoch = 1 keep_prob = 1.0 lr_decay = 0.5 batch_size = 20 vocab_size = 10000def run_epoch(session, model, eval_op=None, verbose=False): """Runs the model on the given data.""" start_time = time.time() costs = 0.0 iters = 0 state = session.run(model.initial_state) fetches = { "cost": model.cost, "final_state": model.final_state, } if eval_op is not None: fetches["eval_op"] = eval_op for step in range(model.input.epoch_size): feed_dict = {} for i, (c, h) in enumerate(model.initial_state): feed_dict[c] = state[i].c feed_dict[h] = state[i].h vals = session.run(fetches, feed_dict) cost = vals["cost"] state = vals["final_state"] costs += cost iters += model.input.num_steps if verbose and step % (model.input.epoch_size // 10) == 10: print("%.3f perplexity: %.3f speed: %.0f wps" % (step * 1.0 / model.input.epoch_size, np.exp(costs / iters), iters * model.input.batch_size / (time.time() - start_time))) return np.exp(costs / iters)raw_data = reader.ptb_raw_data('simple-examples/data/')train_data, valid_data, test_data, _ = raw_dataconfig = SmallConfig()eval_config = SmallConfig()eval_config.batch_size = 1eval_config.num_steps = 1with tf.Graph().as_default(): initializer = tf.random_uniform_initializer(-config.init_scale, config.init_scale) with tf.name_scope("Train"): train_input = PTBInput(config=config, data=train_data, name="TrainInput") with tf.variable_scope("Model", reuse=None, initializer=initializer): m = PTBModel(is_training=True, config=config, input_=train_input) #tf.scalar_summary("Training Loss", m.cost) #tf.scalar_summary("Learning Rate", m.lr) with tf.name_scope("Valid"): valid_input = PTBInput(config=config, data=valid_data, name="ValidInput") with tf.variable_scope("Model", reuse=True, initializer=initializer): mvalid = PTBModel(is_training=False, config=config, input_=valid_input) #tf.scalar_summary("Validation Loss", mvalid.cost) with tf.name_scope("Test"): test_input = PTBInput(config=eval_config, data=test_data, name="TestInput") with tf.variable_scope("Model", reuse=True, initializer=initializer): mtest = PTBModel(is_training=False, config=eval_config, input_=test_input) sv = tf.train.Supervisor() with sv.managed_session() as session: for i in range(config.max_max_epoch): lr_decay = config.lr_decay ** max(i + 1 - config.max_epoch, 0.0) m.assign_lr(session, config.learning_rate * lr_decay) print("Epoch: %d Learning rate: %.3f" % (i + 1, session.run(m.lr))) train_perplexity = run_epoch(session, m, eval_op=m.train_op, verbose=True) print("Epoch: %d Train Perplexity: %.3f" % (i + 1, train_perplexity)) valid_perplexity = run_epoch(session, mvalid) print("Epoch: %d Valid Perplexity: %.3f" % (i + 1, valid_perplexity)) test_perplexity = run_epoch(session, mtest) print("Test Perplexity: %.3f" % test_perplexity) # if FLAGS.save_path: # print("Saving model to %s." % FLAGS.save_path) # sv.saver.save(session, FLAGS.save_path, global_step=sv.global_step)#if __name__ == "__main__":# tf.app.run()
- Tensorflow实例:实现基于LSTM的语言模型
- TensorFlow实现基于LSTM的语言模型
- TensorFlow实战12:实现基于LSTM的语言模型
- TensorFlow实现经典深度学习网络(6):TensorFlow实现基于LSTM的语言模型
- tensorflow38《TensorFlow实战》笔记-07-02 TensorFlow实现基于LSTM的语言模型 code
- Tensorflow实战学习(三十五)【实现基于LSTM语言模型】
- TensorFlow-10-基于 LSTM 建立一个语言模型
- tensorflow RNN LSTM语言模型
- lstm的tensorflow实现
- 学习笔记TF035:实现基于LSTM语言模型
- 深度学习之六,基于RNN(GRU,LSTM)的语言模型分析与theano代码实现
- tensorflow实现基于LSTM的文本分类方法
- 如何用 TensorFlow 实现基于 LSTM 的文本分类
- 基于tensorflow的RNN-LSTM(一)实现RNN
- Tensorflow实现基于LSTM的文本分类方法
- tensorflow实例:实现word2vec语言模型
- TensorFlow-Bitcoin-Robot:一个基于 TensorFlow LSTM 模型的 Bitcoin 价格预测机器人。
- TensorFlow-Bitcoin-Robot-一个基于 TensorFlow LSTM 模型的 Bitcoin 价格预测机器人
- 第13周项目2- 二叉树排序树中查找的路径
- Android基础之动画资源浅析
- Java中的多线程总结
- Android开发之NDK
- Android虚拟键盘的高度计算
- Tensorflow实例:实现基于LSTM的语言模型
- Android开发之WebView的使用(1)
- 三步搭建直播系统源码
- Android开发之WebView的使用(2)
- struts2实现上传文件
- Android Studio中Instant Run
- Android TV Leanback (七)(详情视图)
- python爬虫框架scarpy之AttributeError: module 'scrapy' has no attribute 'spider'
- oracle 行转列 及 pivot函数使用问题