[深度学习]-初识 TensorFlow (Python)

来源:互联网 发布:数据库管理员是青春饭 编辑:程序博客网 时间:2024/05/21 16:53

综述

TensorFlow 是一个编程系统, 使用图来表示计算任务. 图中的节点被称之为 op (operation 的缩写). 一个 op 获得 0 个或多个 Tensor, 执行计算, 产生 0 个或多个 Tensor. 每个 Tensor 是一个类型化的多维数组. 例如, 你可以将一小组图像集表示为一个四维浮点数数组, 这四个维度分别是 [batch, height, width, channels].

一个 TensorFlow 图描述了计算的过程. 为了进行计算, 图必须在 会话 里被启动. 会话 将图的 op 分发到诸如 CPU 或 GPU 之类的 设备 上, 同时提供执行 op 的方法. 这些方法执行后, 将产生的 tensor 返回. 在 Python 语言中, 返回的 tensor 是 numpy ndarray 对象; 在 C 和 C++ 语言中, 返回的 tensor 是 tensorflow::Tensor 实例.

基本概念:

  • 使用图 (graph) 来表示计算任务.
  • 在被称之为 会话 (Session)上下文 (context) 中执行图.
  • 使用 tensor 表示数据.
  • 通过 变量 (Variable) 维护状态.
  • 使用 feed 和 fetch 可以为任意的操作(arbitrary operation) 赋值或者从其中获取数据.

官方安装指南

图与会话

创建图,执行会话

以下代码创建了图:

import tensorflow as tfx = tf.Variable(5, name='x')y = tf.Variable(2, name='y')f = x*x*y + y + 10

上边的代码创建了计算图,但是 没有 执行计算。计算这个图,需要打开一个 TensorFlow Session ,然后使用它来初始化变量以及计算 f:

sess = tf.Session()sess.run(x.initializer)sess.run(y.initializer)print(sess.run(f))sess.close()

如果变量很多,会使得 sess.run() 多次出现。所以,我们使用 with 块来设置默认session:

with tf.Session() as sess:    x.initializer.run()  # equivalent to tf.get_default_session().run(x.initializer)    y.initializer.run()    retsult = f.eval()  # equivalent to calling tf.get_default_session().run(f)    print(retsult)    sess.close()

上边的代码手动去初始化了各个变量。我们也可以使用 global_variables_initializer() 来初始化所有变量(不会立即执行初始化):

init = tf.global_variables_initializer()with tf.Session() as sess:    init.run()    retsult = f.eval()    print(retsult)    sess.close()

管理图

上边的代码都是使用默认图,如果需要在独立的图里边执行代码,可以自行创建图:

import tensorflow as tfx1 = tf.Variable(1)print(x1.graph is tf.get_default_graph())  # Truegraph = tf.Graph()  # 独立的 Graphwith graph.as_default():    x2 = tf.Variable(2)print(x2.graph is tf.get_default_graph())  # False

Node 的存活周期

变量的存活开始于其初始化,结束于会话结束:

import tensorflow as tfw = tf.constant(3)x = w + 2y = x + 3z = x + 4# 计算 w 、 x 两次with tf.Session() as sess:    print(y.eval())    print(z.eval())    sess.close()# 计算 w 、 x 一次with tf.Session() as sess:    y_eval, z_eval = sess.run([y, z])    print(y_eval)    print(z_eval)    sess.close()

示例:使用TensorFlow实现线性回归

θ 等式计算

线性回归的计算我们使用:

θ=(XTX)1XTy

我们引入 sklearn 中 california_housing 来进行演示,代码如下:

import tensorflow as tfimport numpy as npfrom sklearn.datasets import fetch_california_housinghousing = fetch_california_housing()m, n = housing.data.shapehousing_data_with_bias = np.c_[np.ones([m, 1]), housing.data]  X = tf.constant(housing_data_with_bias, dtype=tf.float32, name='X')y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')XT = tf.transpose(X)theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)  # (X^T * X)^-1 * X^T * ywith tf.Session() as sess:    theta_value = theta.eval()    print(theta_value)

输出:

[[ -3.74651413e+01] [  4.35734153e-01] [  9.33829229e-03] [ -1.06622010e-01] [  6.44106984e-01] [ -4.25131839e-06] [ -3.77322501e-03] [ -4.26648885e-01] [ -4.40514028e-01]]

实现梯度下降

下边我们使用梯度下降来代替上边的等式:

import tensorflow as tfimport numpy as npimport numpy.random as rndfrom sklearn.preprocessing import StandardScalerfrom sklearn.datasets import fetch_california_housingfrom datetime import datetimescaler = StandardScaler()housing = fetch_california_housing()m, n = housing.data.shapescale_housing_data = scaler.fit_transform(housing.data)scaled_housing_data_plus_bias = np.c_[np.ones([m, 1]), scale_housing_data]# ### 计算梯度(Batch)###tf.reset_default_graph()n_epochs = 1000learning_rate = 0.01X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name='X')y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1.0, seed=42), name='theta')y_pred = tf.matmul(X, theta, name='predictions')error = y_pred - ymse = tf.reduce_mean(tf.square(error), name='mse')# gradients = 2/m * tf.matmul(tf.transpose(X), error)   # ① 手动计算梯度# training_op = tf.assign(theta, theta - gradients * learning_rate)# gradients = tf.gradients(mse, [theta])[0]             # ② autodiff 自动计算梯度# training_op = tf.assign(theta, theta - gradients * learning_rate)optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)  # ③ 梯度下降优化器# optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.25)  # 可以使用其他优化器training_op = optimizer.minimize(mse)init = tf.global_variables_initializer()saver = tf.train.Saver()with tf.Session() as sess:    # saver.restore(sess, 'my_model_final.ckpt')    sess.run(init)    for epoch in range(n_epochs):        if epoch % 100 == 0:            print("Epoch", epoch, "MSE =", mse.eval())            save_path = saver.save(sess, '/tmp/my_model.ckpt')        sess.run(training_op)    best_theta = theta.eval()    save_path = saver.save(sess, "my_model_final.ckpt")print("Best theta:")print(best_theta)

手动实现梯度下降

gradients = 2/m * tf.matmul(tf.transpose(X), error)   # ① 手动计算梯度# training_op = tf.assign(theta, theta - gradients * learning_rate)
  • tf.random_uniform() 产生随机数
  • tf.assign() 将新值赋予一个变量,在 “① 手动计算梯度” 中,我们使用了它实现
    θ(nextstep)=θ\arrowdown

使用 autodiff 实现梯度下降

使用手动实现梯度下降,在深度神经网络中,代码可能变的冗长易错。我们可以改而使用 symbolic differentiation 对偏导自动查找等式。自动实现梯度下降的主要解决方案如下:

自动实现梯度下降的主要解决方案

gradients = tf.gradients(mse, [theta])[0]             # ② autodiff 自动计算梯度# training_op = tf.assign(theta, theta - gradients * learning_rate)

使用优化器实现梯度下降

TensorFlow 提供了一系列优化器优化器,我们代码中使用了 tf.train.GradientDescentOptimizer() ,也可以使用其他优化器,如 tf.train.MomentumOptimizer()。代码如下:

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)  # ③ 梯度下降优化器# optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.25)  # 可以使用其他优化器training_op = optimizer.minimize(mse)

保存和加载模型

saver = tf.train.Saver()[...]save_path = saver.save(sess, '/tmp/my_model.ckpt')[...]saver.restore(sess, 'my_model_final.ckpt')

Mini-batch 梯度下降 —— 逐步“喂”数据

实现 Mini-batch Gradient Descent 需要在每个迭代中将X和y替换,最简单的就是使用 tf.placeholder()。如下:

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

在每次迭代中通过 feed_dict 参数来填充数据:

X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

全部代码如下:

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")  # “If you specify None for a dimension, it means “any size.”y = tf.placeholder(tf.float32, shape=(None, 1), name="y")theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1, seed=42), name='theta')y_pred = tf.matmul(X, theta, name='predictions')error = y_pred - ymse = tf.reduce_mean(tf.square(error), name='mse')optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)training_op = optimizer.minimize(mse)init = tf.global_variables_initializer()rnd.seed(42)def fetch_batch(epoch, batch_index, batch_size):    rnd.seed(epoch * n_batches + batch_index)    indices = rnd.randint(m, size=batch_size)    X_batch = scaled_housing_data_plus_bias[indices]    y_batch = housing.target.reshape(-1, 1)[indices]    return X_batch, y_batchn_epochs = 10batch_size = 100n_batches = int(np.ceil(m / batch_size))with tf.Session() as sess:    sess.run(init)    for epoch in range(n_epochs):        for batch_index in range(n_batches):            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})    best_theta = theta.eval()print("Best theta:")print(best_theta)

可视化 —— 使用 TensorBoard

首先,定义日志文件目录和名称:

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")root_logdir = "tf_logs"logdir = "{}/run-{}/".format(root_logdir, now)

然后添加下列代码:

mse_summary = tf.summary.scalar('MSE', mse)summary_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

第一行在图中创建一歌节点,将MSE记录进 summary (a TensorBoard-compatible binary log string)。第二行创建 tf.summary.FileWriter() ,用以将所有 summary 写入日志文件目录。

最后使用 add_summary() 更新文件。代码如下:

tf.reset_default_graph()now = datetime.utcnow().strftime("%Y%m%d%H%M%S")root_logdir = "tf_logs"logdir = "{}/run-{}/".format(root_logdir, now)n_epochs = 100learning_rate = 0.01X = tf.placeholder(tf.float32, shape=(None, n+1), name='X')y = tf.placeholder(tf.float32, shape=(None, 1), name='y')theta = tf.Variable(tf.random_uniform([n+1, 1], -1.0, 1, seed=42), name='theta')y_pred = tf.matmul(X, theta, name='predictions')with tf.name_scope('loss') as scope:   # NameScope    error = y_pred - y    mse = tf.reduce_mean(tf.square(error), name='mse')optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)training_op = optimizer.minimize(mse)init = tf.global_variables_initializer()mse_summary = tf.summary.scalar('MSE', mse)summary_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())n_epochs = 10batch_size = 100n_batches = int(np.ceil(m / batch_size))with tf.Session() as sess:    sess.run(init)    for epoch in range(n_epochs):        for batch_index in range(n_batches):            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)            if batch_index % 10 == 0:                summary_str = mse_summary.eval(feed_dict={X:X_batch, y:y_batch})                step = epoch * n_batches + batch_index                summary_writer.add_summary(summary_str, step)            sess.run(training_op, feed_dict={X : X_batch, y : y_batch})    best_theta = theta.eval()summary_writer.flush()summary_writer.close()print("Best theta:")print(best_theta)

终端里边启动 TensorBoard:

(tensorflow) ➜  ch09 git:(master) ✗ tensorboard --logdir ./logsStarting TensorBoard b'41' on port 6006(You can navigate to http://127.0.0.1:6006)...

这个时候可以在浏览器 http://127.0.0.1:6006 中看到图了。

命名空间、模块化和共享变量

Name Scopes

在复杂的模型中很容易产生很多节点,那么图绘变得很乱。所以,我们使用 Name Scope 来使相关节点变成一个群体,如下:

with tf.name_scope('loss') as scope:    error = y_pred - y    mse = tf.reduce_mean(tf.square(error), name="mse")print(error.op.name)  # loss/subprint(mse.op.name)  # loss/mse

Modularity

看一下下边的代码:

tf.reset_default_graph()n_features = 3X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")w1 = tf.Variable(tf.random_normal((n_features, 1)), name="weights1")w2 = tf.Variable(tf.random_normal((n_features, 1)), name="weights2")b1 = tf.Variable(0.0, name="bias1")b2 = tf.Variable(0.0, name="bias2")linear1 = tf.add(tf.matmul(X, w1), b1, name="linear1")linear2 = tf.add(tf.matmul(X, w2), b2, name="linear2")relu1 = tf.maximum(linear1, 0, name="relu1")relu2 = tf.maximum(linear1, 0, name="relu2")  # Oops, cut&paste error! Did you spot it?output = tf.add_n([relu1, relu2], name="output")

上边的代码炒鸡丑陋啊有木有?如果我们需要很多重复操作,那么就需要使其模块化:

tf.reset_default_graph()def relu(X):    with tf.name_scope("relu"):        w_shape = int(X.get_shape()[1]), 1        w = tf.Variable(tf.random_normal(w_shape), name="weights")        b = tf.Variable(0.0, name="bias")        linear = tf.add(tf.matmul(X, w), b, name="linear")        return tf.maximum(linear, 0, name="max")n_features = 3X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")relus = [relu(X) for i in range(5)]output = tf.add_n(relus, name="output")summary_writer = tf.summary.FileWriter("logs/relu2", tf.get_default_graph())

Sharing Variables

如果我们需要一个共享变量,我们有什么办法呢?考虑一下下边几种方案:

  • 创建后在函数中通过参数传递。这种方法在需要很多共享变量时变得很痛苦。
tf.reset_default_graph()def relu(X, threshold):    with tf.name_scope("relu"):        w_shape = int(X.get_shape()[1]), 1        w = tf.Variable(tf.random_normal(w_shape), name="weights")        b = tf.Variable(0.0, name="bias")        linear = tf.add(tf.matmul(X, w), b, name="linear")        return tf.maximum(linear, threshold, name="max")threshold = tf.Variable(0.0, name="threshold")X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")relus = [relu(X, threshold) for i in range(5)]output = tf.add_n(relus, name="output")
  • 使用类或者字典来保存。或者是在 relu() 首次调用时设置这个共享变量。
tf.reset_default_graph()def relu(X):    with tf.name_scope("relu"):        if not hasattr(relu, "threshold"):            relu.threshold = tf.Variable(0.0, name="threshold")        w_shape = int(X.get_shape()[1]), 1        w = tf.Variable(tf.random_normal(w_shape), name="weights")        b = tf.Variable(0.0, name="bias")        linear = tf.add(tf.matmul(X, w), b, name="linear")        return tf.maximum(linear, relu.threshold, name="max")X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")relus = [relu(X) for i in range(5)]output = tf.add_n(relus, name="output")
  • TensorFlow 的方案

TensorFlow 使用 get_variable() 来处理共享变量:不存在则创建,存在则复用。他的行为(创建还是复用)通过 variable_scope() 来控制:

tf.reset_default_graph()def relu(X):    with tf.variable_scope("relu", reuse=True):        threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_initializer(0.0))        w_shape = int(X.get_shape()[1]), 1        w = tf.Variable(tf.random_normal(w_shape), name="weights")        b = tf.Variable(0.0, name="bias")        linear = tf.add(tf.matmul(X, w), b, name="linear")        return tf.maximum(linear, threshold, name="max")X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")with tf.variable_scope("relu"):    threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_initializer(0.0))relus = [relu(X) for i in range(5)]output = tf.add_n(relus, name="output")summary_writer = tf.summary.FileWriter("logs/relu6", tf.get_default_graph())summary_writer.close()

上边的共享变量是在主题方法外定义的,使用下列代码将其放在方法内:

import tensorflow as tfn_features = 3def relu(X):    with tf.variable_scope("relu"):        threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_initializer(0.0))        w_shape = int(X.get_shape()[1]), 1        w = tf.Variable(tf.random_normal(w_shape), name="weights")        b = tf.Variable(0.0, name="bias")        linear = tf.add(tf.matmul(X, w), b, name="linear")        return tf.maximum(linear, threshold, name="max")X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")with tf.variable_scope("", default_name="") as scope:    first_relu = relu(X)     # create the shared variable    scope.reuse_variables()  # then reuse it    relus = [first_relu] + [relu(X) for i in range(4)]output = tf.add_n(relus, name="output")summary_writer = tf.summary.FileWriter("logs/relu8", tf.get_default_graph())summary_writer.close()