机器学习之softmax函数

来源：互联网发布：php object to xml 编辑：程序博客网时间：2024/06/06 03:18

Softmax分类函数

这篇教程是翻译Peter Roelants写的神经网络教程，作者已经授权翻译，这是原文。

该教程将介绍如何入门神经网络，一共包含五部分。你可以在以下链接找到完整内容。

（一）神经网络入门之线性回归
Logistic分类函数
（二）神经网络入门之Logistic回归（分类问题）
（三）神经网络入门之隐藏层设计
Softmax分类函数
（四）神经网络入门之矢量化
（五）神经网络入门之构建多层网络

softmax分类函数

这部分教程将介绍两部分：

softmax函数
交叉熵损失函数

在先前的教程中，我们已经使用学习了如何使用Logistic函数来实现二分类问题。对于多分类问题，我们可以使用多项Logistic回归，该方法也被称之为softmax函数。接下来，我们来解释什么事softmax函数，以及怎么得到它。

我们先导入教程需要使用的软件包。

import numpy as np import matplotlib.pyplot as plt  from matplotlib.colors import colorConverter, ListedColormap from mpl_toolkits.mplot3d import Axes3D  from matplotlib import cm

Softmax函数

在之前的教程中，我们已经知道了Logistic函数只能被使用在二分类问题中，但是它的多项式回归，即softmax函数，可以解决多分类问题。假设softmax函数ς的输入数据是C维度的向量z，那么softmax函数的数据也是一个C维度的向量y，里面的值是0到1之间。softmax函数其实就是一个归一化的指数函数，定义如下：

softmax函数

式子中的分母充当了正则项的作用，可以使得

作为神经网络的输出层，softmax函数中的值可以用C个神经元来表示。

对于给定的输入z，我们可以得到每个分类的概率t = c for c = 1 ... C可以表示为：

概率方程

其中，P(t=c|z)表示，在给定输入z时，该输入数据是c分类的概率。

下图展示了在一个二分类(t = 1, t = 2)中，输入向量是z = [z1, z2]，那么输出概率P(t=1|z)如下图所示。

# Define the softmax functiondef softmax(z):    return np.exp(z) / np.sum(np.exp(z))

# Plot the softmax output for 2 dimensions for both classes# Plot the output in function of the weights# Define a vector of weights for which we want to plot the ooutputnb_of_zs = 200zs = np.linspace(-10, 10, num=nb_of_zs) # input zs_1, zs_2 = np.meshgrid(zs, zs) # generate gridy = np.zeros((nb_of_zs, nb_of_zs, 2)) # initialize output# Fill the output matrix for each combination of input z'sfor i in range(nb_of_zs):    for j in range(nb_of_zs):        y[i,j,:] = softmax(np.asarray([zs_1[i,j], zs_2[i,j]]))# Plot the cost function surfaces for both classesfig = plt.figure()# Plot the cost function surface for t=1ax = fig.gca(projection='3d')surf = ax.plot_surface(zs_1, zs_2, y[:,:,0], linewidth=0, cmap=cm.coolwarm)ax.view_init(elev=30, azim=70)cbar = fig.colorbar(surf)ax.set_xlabel('$z_1$', fontsize=15)ax.set_ylabel('$z_2$', fontsize=15)ax.set_zlabel('$y_1$', fontsize=15)ax.set_title ('$P(t=1|\mathbf{z})$')cbar.ax.set_ylabel('$P(t=1|\mathbf{z})$', fontsize=15)plt.grid()plt.show()