Tensorflow Model Persistence

来源:互联网 发布:悍将传世源码 编辑:程序博客网 时间:2024/06/05 05:06

Methods of tf model persistence
如果我们的神经网络比较复杂,训练数据比较多,那么我们的模型训练就会耗时很长,如果在训练过程中出现某些不可预计的错误,导致我们的训练意外终止,那么我们将会前功尽弃。为了避免这个问题,我们就可以通过模型持久化(保存为CKPT格式)来暂存我们训练过程中的临时数据。
如果我们训练的模型需要提供给用户做离线的预测,那么我们只需要前向传播的过程,只需得到预测值就可以了,这个时候我们就可以通过模型持久化(保存为PB格式)只保存前向传播中需要的变量并将变量的值固定下来,这个时候只需用户提供一个输入,我们就可以通过模型得到一个输出给用户。

1\保存为 CKPT 格式的模型
定义运算过程
声明并得到一个 Saver
通过 Saver.save 保存模型

# coding=UTF-8 支持中文编码格式import tensorflow as tfimport shutilimport os.pathMODEL_DIR = "model/ckpt"MODEL_NAME = "model.ckpt"# if os.path.exists(MODEL_DIR): 删除目录#     shutil.rmtree(MODEL_DIR)if not tf.gfile.Exists(MODEL_DIR): #创建目录    tf.gfile.MakeDirs(MODEL_DIR)#下面的过程你可以替换成CNN、RNN等你想做的训练过程,这里只是简单的一个计算公式input_holder = tf.placeholder(tf.float32, shape=[1], name="input_holder") #输入占位符,并指定名字,后续模型读取可能会用的W1 = tf.Variable(tf.constant(5.0, shape=[1]), name="W1")B1 = tf.Variable(tf.constant(1.0, shape=[1]), name="B1")_y = (input_holder * W1) + B1predictions = tf.greater(_y, 50, name="predictions") #输出节点名字,后续模型读取会用到,比50大返回true,否则返回falseinit = tf.global_variables_initializer()saver = tf.train.Saver() #声明saver用于保存模型with tf.Session() as sess:    sess.run(init)    print "predictions : ", sess.run(predictions, feed_dict={input_holder: [10.0]}) #输入一个数据测试一下    saver.save(sess, os.path.join(MODEL_DIR, MODEL_NAME)) #模型保存    print("%d ops in the final graph." % len(tf.get_default_graph().as_graph_def().node)) #得到当前图有几个操作节点for op in tf.get_default_graph().get_operations(): #打印模型节点信息    print (op.name, op.values())

运行后生成的文件如下:
checkpoint : 记录目录下所有模型文件列表
ckpt.data : 保存模型中每个变量的取值
ckpt.meta : 保存整个计算图的结构

2\保存为 PB 格式模型

定义运算过程
通过 get_default_graph().as_graph_def() 得到当前图的计算节点信息
通过 graph_util.convert_variables_to_constants 将相关节点的values固定
通过 tf.gfile.GFile 进行模型持久化

# coding=UTF-8import tensorflow as tfimport shutilimport os.pathfrom tensorflow.python.framework import graph_util# MODEL_DIR = "model/pb"# MODEL_NAME = "addmodel.pb"# if os.path.exists(MODEL_DIR): 删除目录#     shutil.rmtree(MODEL_DIR)## if not tf.gfile.Exists(MODEL_DIR): #创建目录#     tf.gfile.MakeDirs(MODEL_DIR)output_graph = "model/pb/add_model.pb"#下面的过程你可以替换成CNN、RNN等你想做的训练过程,这里只是简单的一个计算公式input_holder = tf.placeholder(tf.float32, shape=[1], name="input_holder")W1 = tf.Variable(tf.constant(5.0, shape=[1]), name="W1")B1 = tf.Variable(tf.constant(1.0, shape=[1]), name="B1")_y = (input_holder * W1) + B1# predictions = tf.greater(_y, 50, name="predictions") #比50大返回true,否则返回falsepredictions = tf.add(_y, 10,name="predictions") #做一个加法运算init = tf.global_variables_initializer()with tf.Session() as sess:    sess.run(init)    print "predictions : ", sess.run(predictions, feed_dict={input_holder: [10.0]})    graph_def = tf.get_default_graph().as_graph_def() #得到当前的图的 GraphDef 部分,通过这个部分就可以完成重输入层到输出层的计算过程    output_graph_def = graph_util.convert_variables_to_constants(  # 模型持久化,将变量值固定        sess,        graph_def,        ["predictions"] #需要保存节点的名字    )    with tf.gfile.GFile(output_graph, "wb") as f:  # 保存模型        f.write(output_graph_def.SerializeToString())  # 序列化输出    print("%d ops in the final graph." % len(output_graph_def.node))    print (predictions)# for op in tf.get_default_graph().get_operations(): 打印模型节点信息#     print (op.name)

*GraphDef:这个属性记录了tensorflow计算图上节点的信息。

运行后生成的文件如下:
add_model.pb
frozen_model.pb

add_model.pb : 里面保存了重输入层到输出层这个计算过程的计算图和相关变量的值,我们得到这个模型后传入一个输入,既可以得到一个预估的输出值

3\CKPT 转换成 PB格式

通过传入 CKPT 模型的路径得到模型的图和变量数据
通过 import_meta_graph 导入模型中的图
通过 saver.restore 从模型中恢复图中各个变量的数据
通过 graph_util.convert_variables_to_constants 将模型持久化

# coding=UTF-8import tensorflow as tfimport os.pathimport argparsefrom tensorflow.python.framework import graph_utilMODEL_DIR = "model/pb"MODEL_NAME = "frozen_model.pb"if not tf.gfile.Exists(MODEL_DIR): #创建目录    tf.gfile.MakeDirs(MODEL_DIR)def freeze_graph(model_folder):    checkpoint = tf.train.get_checkpoint_state(model_folder) #检查目录下ckpt文件状态是否可用    input_checkpoint = checkpoint.model_checkpoint_path #得ckpt文件路径    output_graph = os.path.join(MODEL_DIR, MODEL_NAME) #PB模型保存路径    output_node_names = "predictions" #原模型输出操作节点的名字    saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=True) #得到图、clear_devices :Whether or not to clear the device field for an `Operation` or `Tensor` during import.    graph = tf.get_default_graph() #获得默认的图    input_graph_def = graph.as_graph_def()  #返回一个序列化的图代表当前的图    with tf.Session() as sess:        saver.restore(sess, input_checkpoint) #恢复图并得到数据        print "predictions : ", sess.run("predictions:0", feed_dict={"input_holder:0": [10.0]}) # 测试读出来的模型是否正确,注意这里传入的是输出 和输入 节点的 tensor的名字,不是操作节点的名字        output_graph_def = graph_util.convert_variables_to_constants(  #模型持久化,将变量值固定            sess,            input_graph_def,            output_node_names.split(",") #如果有多个输出节点,以逗号隔开        )        with tf.gfile.GFile(output_graph, "wb") as f: #保存模型            f.write(output_graph_def.SerializeToString()) #序列化输出        print("%d ops in the final graph." % len(output_graph_def.node)) #得到当前图有几个操作节点        for op in graph.get_operations():            print(op.name, op.values())if __name__ == '__main__':    parser = argparse.ArgumentParser()    parser.add_argument("model_folder", type=str, help="input ckpt model dir") #命令行解析,help是提示符,type是输入的类型,    # 这里运行程序时需要带上模型ckpt的路径,不然会报 error: too few arguments    aggs = parser.parse_args()    freeze_graph(aggs.model_folder)    # freeze_graph("model/ckpt") #模型目录

In this Tensorflow tutorial, I shall explain:

How does a Tensorflow model look like?
How to save a Tensorflow model?
How to restore a Tensorflow model for prediction/transfer learning?
How to work with imported pretrained models for fine-tuning and modification
This tutorial assumes that you have some idea about training a neural network. Otherwise, please follow this tutorial and come back here.

1.What is a Tensorflow model?:
After you have trained a neural network, you would want to save it for future use and deploying to production. So, what is a Tensorflow model? Tensorflow model primarily contains the network design or graph and values of the network parameters that we have trained. Hence, Tensorflow model has two main files:

a) Meta graph:

This is a protocol buffer which saves the complete Tensorflow graph; i.e. all variables, operations, collections etc. This file has .meta extension.

b) Checkpoint file:

This is a binary file which contains all the values of the weights, biases, gradients and all the other variables saved. This file has an extension .ckpt. However, Tensorflow has changed this from version 0.11. Now, instead of single .ckpt file, we have two files:
mymodel.data-00000-of-00001
mymodel.index

mymodel.data-00000-of-00001
mymodel.index
.data file is the file that contains our training variables and we shall go after it.

Along with this, Tensorflow also has a file named checkpoint which simply keeps a record of latest checkpoint files saved.

So, to summarize, Tensorflow models for versions greater than 0.10 look like this:
while Tensorflow model before 0.11 contained only three files:

inception_v1.meta
inception_v1.ckpt
checkpoint
Now that we know how a Tensorflow model looks like, let’s learn how to save the model.

  1. Saving a Tensorflow model:
    Let’s say, you are training a convolutional neural network for image classification. As a standard practice, you keep a watch on loss and accuracy numbers. Once you see that the network has converged, you can stop the training manually or you will run the training for fixed number of epochs. After the training is done, we want to save all the variables and network graph to a file for future use. So, in Tensorflow, you want to save the graph and values of all the parameters for which we shall be creating an instance of tf.train.Saver() class.
saver = tf.train.Saver()

Remember that Tensorflow variables are only alive inside a session. So, you have to save the model inside a session by calling save method on saver object you just created.

saver.save(sess, 'my-test-model')

Here, sess is the session object, while ‘my-test-model’ is the name you want to give your model. Let’s see a complete example:

import tensorflow as tfw1 = tf.Variable(tf.random_normal(shape=[2]), name='w1')w2 = tf.Variable(tf.random_normal(shape=[5]), name='w2')saver = tf.train.Saver()sess = tf.Session()sess.run(tf.global_variables_initializer())saver.save(sess, 'my_test_model')# This will save following files in Tensorflow v >= 0.11# my_test_model.data-00000-of-00001# my_test_model.index# my_test_model.meta# checkpoint

If we are saving the model after 1000 iterations, we shall call save by passing the step count:

saver.save(sess, 'my_test_model',global_step=1000)

This will just append ‘-1000’ to the model name and following files will be created:

my_test_model-1000.index
my_test_model-1000.meta
my_test_model-1000.data-00000-of-00001
checkpoint

Let’s say, while training, we are saving our model after every 1000 iterations, so .meta file is created the first time(on 1000th iteration) and we don’t need to recreate the .meta file each time(so, we don’t save the .meta file at 2000, 3000.. or any other iteration). We only save the model for further iterations, as the graph will not change. Hence, when we don’t want to write the meta-graph we use this:

saver.save(sess, 'my-model', global_step=step,write_meta_graph=False)

If you want to keep only 4 latest models and want to save one model after every 2 hours during training you can use max_to_keep and keep_checkpoint_every_n_hours like this.

#saves a model every 2 hours and maximum 4 latest models are saved.saver = tf.train.Saver(max_to_keep=4, keep_checkpoint_every_n_hours=2)

Note, if we don’t specify anything in the tf.train.Saver(), it saves all the variables. What if, we don’t want to save all the variables and just some of them. We can specify the variables/collections we want to save. While creating the tf.train.Saver instance we pass it a list or a dictionary of variables that we want to save. Let’s look at an example:

import tensorflow as tfw1 = tf.Variable(tf.random_normal(shape=[2]), name='w1')w2 = tf.Variable(tf.random_normal(shape=[5]), name='w2')saver = tf.train.Saver([w1,w2])sess = tf.Session()sess.run(tf.global_variables_initializer())saver.save(sess, 'my_test_model',global_step=1000)
  1. Importing a pre-trained model:
    If you want to use someone else’s pre-trained model for fine-tuning, there are two things you need to do:

a) Create the network:

You can create the network by writing python code to create each and every layer manually as the original model. However, if you think about it, we had saved the network in .meta file which we can use to recreate the network using tf.train.import() function like this: saver = tf.train.import_meta_graph(‘my_test_model-1000.meta’)

Remember, import_meta_graph appends the network defined in .meta file to the current graph. So, this will create the graph/network for you but we still need to load the value of the parameters that we had trained on this graph.

b) Load the parameters:

We can restore the parameters of the network by calling restore on this saver which is an instance of tf.train.Saver() class.

with tf.Session() as sess:  new_saver = tf.train.import_meta_graph('my_test_model-1000.meta')  new_saver.restore(sess, tf.train.latest_checkpoint('./'))

After this, the value of tensors like w1 and w2 has been restored and can be accessed:

with tf.Session() as sess:        saver = tf.train.import_meta_graph('my-model-1000.meta')    saver.restore(sess,tf.train.latest_checkpoint('./'))    print(sess.run('w1:0'))##Model has been restored. Above statement will print the saved value of w1.

So, now you have understood how saving and importing works for a Tensorflow model. In the next section, I have described a practical usage of above to load any pre-trained model.

  1. Working with restored models
    Now that you have understood how to save and restore Tensorflow models, Let’s develop a practical guide to restore any pre-trained model and use it for prediction, fine-tuning or further training. Whenever you are working with Tensorflow, you define a graph which is fed examples(training data) and some hyperparameters like learning rate, global step etc. It’s a standard practice to feed all the training data and hyperparameters using placeholders. Let’s build a small network using placeholders and save it. Note that when the network is saved, values of the placeholders are not saved.
import tensorflow as tf#Prepare to feed input, i.e. feed_dict and placeholdersw1 = tf.placeholder("float", name="w1")w2 = tf.placeholder("float", name="w2")b1= tf.Variable(2.0,name="bias")feed_dict ={w1:4,w2:8}#Define a test operation that we will restorew3 = tf.add(w1,w2)w4 = tf.multiply(w3,b1,name="op_to_restore")sess = tf.Session()sess.run(tf.global_variables_initializer())#Create a saver object which will save all the variablessaver = tf.train.Saver()#Run the operation by feeding inputprint sess.run(w4,feed_dict)#Prints 24 which is sum of (w1+w2)*b1 #Now, save the graphsaver.save(sess, 'my_test_model',global_step=1000)

Now, when we want to restore it, we not only have to restore the graph and weights, but also prepare a new feed_dict that will feed the new training data to the network. We can get reference to these saved operations and placeholder variables via graph.get_tensor_by_name() method.

#How to access saved variable/Tensor/placeholders w1 = graph.get_tensor_by_name("w1:0")## How to access saved operationop_to_restore = graph.get_tensor_by_name("op_to_restore:0")

If we just want to run the same network with different data, you can simply pass the new data via feed_dict to the network.

import tensorflow as tfsess=tf.Session()    #First let's load meta graph and restore weightssaver = tf.train.import_meta_graph('my_test_model-1000.meta')saver.restore(sess,tf.train.latest_checkpoint('./'))# Now, let's access and create placeholders variables and# create feed-dict to feed new datagraph = tf.get_default_graph()w1 = graph.get_tensor_by_name("w1:0")w2 = graph.get_tensor_by_name("w2:0")feed_dict ={w1:13.0,w2:17.0}#Now, access the op that you want to run. op_to_restore = graph.get_tensor_by_name("op_to_restore:0")print sess.run(op_to_restore,feed_dict)#This will print 60 which is calculated #using new values of w1 and w2 and saved value of b1. 

What if you want to add more operations to the graph by adding more layers and then train it. Of course you can do that too. See here:

import tensorflow as tfsess=tf.Session()    #First let's load meta graph and restore weightssaver = tf.train.import_meta_graph('my_test_model-1000.meta')saver.restore(sess,tf.train.latest_checkpoint('./'))# Now, let's access and create placeholders variables and# create feed-dict to feed new datagraph = tf.get_default_graph()w1 = graph.get_tensor_by_name("w1:0")w2 = graph.get_tensor_by_name("w2:0")feed_dict ={w1:13.0,w2:17.0}#Now, access the op that you want to run. op_to_restore = graph.get_tensor_by_name("op_to_restore:0")#Add more to the current graphadd_on_op = tf.multiply(op_to_restore,2)print sess.run(add_on_op,feed_dict)#This will print 120.

But, can you restore part of the old graph and add-on to that for fine-tuning ? Of-course you can, just access the appropriate operation by graph.get_tensor_by_name() method and build graph on top of that. Here is a real world example. Here we load a vgg pre-trained network using meta graph and change the number of outputs to 2 in the last layer for fine-tuning with new data.

............saver = tf.train.import_meta_graph('vgg.meta')# Access the graphgraph = tf.get_default_graph()## Prepare the feed_dict for feeding data for fine-tuning #Access the appropriate output for fine-tuningfc7= graph.get_tensor_by_name('fc7:0')#use this if you only want to change gradients of the last layerfc7 = tf.stop_gradient(fc7) # It's an identity functionfc7_shape= fc7.get_shape().as_list()new_outputs=2weights = tf.Variable(tf.truncated_normal([fc7_shape[3], num_outputs], stddev=0.05))biases = tf.Variable(tf.constant(0.05, shape=[num_outputs]))output = tf.matmul(fc7, weights) + biasespred = tf.nn.softmax(output)# Now, you run this with fine-tuning data in sess.run()