TensorFlow学习系列（四）：利用神经网络实现泛逼近器（universal approximator）

来源：互联网发布：q3 1.4t 2.0t 知乎编辑：程序博客网时间：2024/06/06 04:02

这篇教程是翻译Morgan写的TensorFlow教程，作者已经授权翻译，这是原文。

import tensorflow as tfdef univAprox(x, hidden_dim=50):    # The simple case is f: R -> R    input_dim = 1     output_dim = 1    with tf.variable_scope('UniversalApproximator'):        ua_w = tf.get_variable('ua_w', shape=[input_dim, hidden_dim], initializer=tf.random_normal_initializer(stddev=.1))        ua_b = tf.get_variable('ua_b', shape=[hidden_dim], initializer=tf.constant_initializer(0.))        z = tf.matmul(x, ua_w) + ua_b        a = tf.nn.relu(z) # we now have our hidden_dim activations        ua_v = tf.get_variable('ua_v', shape=[hidden_dim, output_dim], initializer=tf.random_normal_initializer(stddev=.1))        z = tf.matmul(a, ua_v)    return z

一些注意事项：

x 必须是秩为2的TensorFlow张量，也就是说 x 的维度是 * [None, 1]* ** 。其中，None表示批处理大小，你可以把它看做是在单个神经元上面计算所需要的容量。
在这里，input_dim和output_dim参数我们都是采取硬编码方式，当然如果你想要处理更复杂的函数，你可以修改这两个参数。在我们的例子中，我们取一个神经元是为了使问题尽量的简单。
最后，我们使用了 Relu 激活函数。当然我们可以使用很多不同的激活函数来代替 Relu 函数，但是对于这个理论是无所谓的，因为我们只需要一个递增的函数就行了，选择不同的激活函数只是和学习的速度有关。

接下来，让我们编写一个很简单的脚本来评估这个函数：

x = tf.placeholder(tf.float32, shape=[None, 1], name="x")y = univAprox(x)with tf.Session() as sess:    sess.run(tf.global_variables_initializer())    y_res = sess.run(y, feed_dict={        x: [[0], [1], [2]] # We are batching 3 value at the same time    })    print(y_res) # -> [[ 0. ] [ 0.0373688 ] [ 0.07473759]]    # Those values will be different for you, since we initialize our variables randomly

至此，我们已经完成了泛逼近器（UA）的设计和开发。

接下来我们需要去训练这个泛逼近器，去逼近我们给定的闭区间内的任何函数。

让我们从正弦函数（the sine function）开始吧，我个人不是很相信神经网络可以很好的近似一个函数。

提示：如果你和我一样，想知道这种近似是怎么做到的，我可以给你一个数学提示：
* 在闭区间上的任何连续函数都可以通过分段常数函数 piecewise constant function 来近似。
* 你可以手动建立一个神经网络，通过添加必要的神经元来构造这个分段函数。

接下来，我们可以构造一个脚本来做三件事：

在正弦函数上训练我们的UA。
用图来对比我们神经网络的结果和原始的正弦函数。
从命令行中可以导入 hidden_dim 参数的值，以便能够更加轻松的修改它。

我将直接在这里发布整个脚本文件，包含说明注释。

我相信发布一个完整的代码对你的学习是非常有利的（不要害怕文件太长，里面包含了很多的注释和空行）。

# First let's import all the tools needed# Some basic toolsimport time, os, argparse, iodir = os.path.dirname(os.path.realpath(__file__))# Tensorflow and numpy!import tensorflow as tfimport numpy as np# Matplotlib, so we can graph our functions# The Agg backend is here for those running this on a server without X sessionsimport matplotlibmatplotlib.use('Agg')import matplotlib.pyplot as plt# Our UA functiondef univAprox(x, hidden_dim=50):    # The simple case is f: R -> R    input_dim = 1     output_dim = 1    with tf.variable_scope('UniversalApproximator'):        ua_w = tf.get_variable(            name='ua_w'            , shape=[input_dim, hidden_dim]            , initializer=tf.random_normal_initializer(stddev=.1)        )        ua_b = tf.get_variable(            name='ua_b'            , shape=[hidden_dim]            , initializer=tf.constant_initializer(0.)        )        z = tf.matmul(x, ua_w) + ua_b        a = tf.nn.relu(z) # we now have our hidden_dim activations        ua_v = tf.get_variable(            name='ua_v'            , shape=[hidden_dim, output_dim]            , initializer=tf.random_normal_initializer(stddev=.1)        )        z = tf.matmul(a, ua_v)    return z# We define the function we want to approximatedef func_to_approx(x):    return tf.sin(x)if __name__ == '__main__': # When we call the script directly ...    # ... we parse a potentiel --nb_neurons argument     parser = argparse.ArgumentParser()    parser.add_argument("--nb_neurons", default=50, type=int, help="Number of neurons or the UA")    args = parser.parse_args()    # We build the computation graph    with tf.variable_scope('Graph') as scope:        # Our inputs will be a batch of values taken by our functions        x = tf.placeholder(tf.float32, shape=[None, 1], name="x")        # We define the ground truth and our approximation         y_true = func_to_approx(x)        y = univAprox(x, args.nb_neurons)        # We define the resulting loss and graph it using tensorboard        with tf.variable_scope('Loss'):            loss = tf.reduce_mean(tf.square(y - y_true))            # (Note the "_t" suffix here. It is pretty handy to avoid mixing             # tensor summaries and their actual computed summaries)            loss_summary_t = tf.summary.scalar('loss', loss)         # We define our train operation using the Adam optimizer        adam = tf.train.AdamOptimizer(learning_rate=1e-2)        train_op = adam.minimize(loss)    # This is some tricks to push our matplotlib graph inside tensorboard    with tf.variable_scope('TensorboardMatplotlibInput') as scope:        # Matplotlib will give us the image as a string ...        img_strbuf_plh = tf.placeholder(tf.string, shape=[])         # ... encoded in the PNG format ...        my_img = tf.image.decode_png(img_strbuf_plh, 4)         # ... that we transform into an image summary        img_summary = tf.summary.image(             'matplotlib_graph'            , tf.expand_dims(my_img, 0)        )     # We create a Saver as we want to save our UA after training    saver = tf.train.Saver()    with tf.Session() as sess:        # We create a SummaryWriter to save data for TensorBoard        result_folder = dir + '/results/' + str(int(time.time()))        sw = tf.summary.FileWriter(result_folder, sess.graph)        print('Training our universal approximator')        sess.run(tf.global_variables_initializer())        for i in range(3000):            # We uniformly select a lot of points for a good approximation ...            x_in = np.random.uniform(-10, 10, [100000, 1])            # ... and train on it            current_loss, loss_summary, _ = sess.run([loss, loss_summary_t, train_op], feed_dict={                x: x_in            })            # We leverage tensorboard by keeping track of the loss in real time            sw.add_summary(loss_summary, i + 1)            if (i + 1) % 100 == 0:                print('batch: %d, loss: %f' % (i + 1, current_loss))        print('Plotting graphs')        # We compute a dense enough graph of our functions        inputs = np.array([ [(i - 1000) / 100] for i in range(2000) ])        y_true_res, y_res = sess.run([y_true, y], feed_dict={            x: inputs        })        # We plot it using matplotlib        # (This is some matplotlib wizardry to get an image as a string,        # read the matplotlib documentation for more information)        plt.figure(1)        plt.subplot(211)        plt.plot(inputs, y_true_res.flatten())        plt.subplot(212)        plt.plot(inputs, y_res)        imgdata = io.BytesIO()        plt.savefig(imgdata, format='png')        imgdata.seek(0)        # We push our graph into TensorBoard        plot_img_summary = sess.run(img_summary, feed_dict={            img_strbuf_plh: imgdata.getvalue()        })        sw.add_summary(plot_img_summary, i + 1)        plt.clf()        # Finally we save the graph to check that it looks like what we wanted        saver.save(sess, result_folder + '/data.chkp')

现在你可以在电脑上打开两个终端，并在主目录下启动以下命令来看看能发生什么：

python myfile.py –nb_neurons 50
tensorboard –logdir results –reload_interval 5 （默认的 reload_interval 是120秒，以避免在计算机上面太快统计，但是在我们的情况下，我们可以安全地加速一点）

现在，你可以实时的查看 UA 的训练过程了，观察它是怎么学习正弦函数的。

请记住，如果我们增加隐藏层的神经元个数，那么对于函数的近似效果会更加的好。

让我给你展示一下 4 种不同的隐藏层神经元个数 [20, 50, 100,500]，所带来的函数近似效果吧。

Different graph showing the effect of the number of neurons in the UA

正如所预期的，如果我们增加神经元的数量，那么我们的近似函数 UA 将更好的近似我们的正弦函数。事实上，我们可以让神经网络模拟的数值尽可能的近似目标函数。这个工作是不是很漂亮 :)

然而，我们的 UA 模型有一个巨大的缺点，如果 input_dim 开始改变，那么我们不能对它进行重用。

我有一个疯狂的想法，如果我们能设计一个 UA，使得它能逼近一个复杂神经网络的激活函数！难道这不是一个很酷的设想吗？

我认为这是一个很好的练习例子，你怎么做能够欺骗 TensorFlow 去实现处理一个动态的输入维度。（具体可以参考我的 Github，但我建议你自己先写一下。）

在文章的最后，送大家一个小礼物：在MNIST数据集上，我已经使用了第二种方法去训练了一个神经网络，也就是说我们在一个神经网络中，使用另一个神经网络来代替激活函数。

以下图是激活函数近似的图形，是不是看起来很激动！

提示：在第二种 UA 中，我使用 ELU 函数作为了激活函数，所以看起来是一个凸的。所以，这些近似结果发生了多次。
我在 MNIST 测试集上面取得了 0.98 的正确率，这个结果给我一个启发，有可能激活函数对于一个任务的学习可能不是很重要。

Reference：

Universal approximation theorem

Universal Approximation Theorem — Neural Networks

如果觉得内容有用，帮助多多分享哦 :)

长按或者扫描如下二维码，关注 “CoderPai” 微信号（coderpai）。添加底部的 coderpai 小助手，添加小助手时，请备注 “算法” 二字，小助手会拉你进算法群。如果你想进入 AI 实战群，那么请备注 “AI”，小助手会拉你进AI实战群。

0 0