2卷积神经网络相关API详解-2.5TensorFlow之损失函数API详解

来源：互联网发布：sha256算法的基本流程编辑：程序博客网时间：2024/06/05 02:15

参考：
http://www.tensorfly.cn/tfdoc/api_docs/python/nn.html#l2_loss
http://www.cnblogs.com/lovephysics/p/7222488.html
https://www.2cto.com/kf/201612/580565.html
http://www.studyai.com/
http://www.tensorfly.cn/tfdoc/api_docs/python/nn.html#AUTOGENERATED-classification
http://blog.csdn.net/u012871493/article/details/72758186

Losses

The loss ops measure error between two tensors, or between a tensor and zero. These can be used for measuring accuracy of a network in a regression task or for regularization purposes (weight decay).

tf.nn.l2_loss(t, name=None)

L2 Loss.

Computes half the L2 norm of a tensor without the sqrt:

output = sum(t ** 2) / 2
Args:

t: A Tensor. Must be one of the following types: float32, float64, int64, int32, uint8, int16, int8, complex64, qint8, quint8, qint32. Typically 2-D, but may have any dimensions.
name: A name for the operation (optional).
Returns:

A Tensor. Has the same type as t. 0-D.

解释：这个函数的作用是利用 L2 范数来计算张量的误差值，但是没有开方并且只取 L2 范数的值的一半，具体如下：

output = sum(t ** 2) / 2

2、tensorflow实现

import tensorflow as tfa=tf.constant([1,2,3],dtype=tf.float32)b=tf.constant([[1,1],[2,2],[3,3]],dtype=tf.float32)with tf.Session() as sess:    print('a:')    print(sess.run(tf.nn.l2_loss(a)))    print('b:')    print(sess.run(tf.nn.l2_loss(b)))    sess.close()

输出结果：
a:
7.0
b:
14.0

输入参数：

t: 一个Tensor。数据类型必须是一下之一：float32，float64，int64，int32，uint8，int16，int8，complex64，qint8，quint8，qint32。虽然一般情况下，数据维度是二维的。但是，数据维度可以取任意维度。
name: 为这个操作取个名字。
输出参数：

一个 Tensor ，数据类型和 t 相同，是一个标量。

分类器损失：交叉熵损失的定义

交叉熵：

交叉熵是Loss函数的一种（也称为损失函数或代价函数），用于描述模型预测值与真实值的差距大小，常见的Loss函数就是均方平均差（Mean Squared Error），定义如下：
C=(y−a)22

平方差很好理解，预测值与真实值直接相减，为了避免得到负数取绝对值或者平方，再做平均就是均方平方差。注意这里预测值需要经过sigmoid激活函数，得到取值范围在0到1之间的预测值。平方差可以表达预测值与真实值的差异，但在分类问题种效果并不如交叉熵好。

神经元的输出为a=σ(z)，这里z=∑jwjxj+b 。我们定义这个神经元的交叉熵代价函数为：
C=−1n∑x[ylna+(1−y)ln(1−a)]，
这里n是训练数据的个数，这个加和覆盖了所有的训练输入x，y是期望输出。
这里写图片描述
交叉熵为何能作为代价函数？

交叉熵有两个特性能够合理地解释为何它能作为代价函数。首先，它是非负的，也就是说，C>0.为了说明这个，我们需要注意到：（1）上式中加和里的每一项都是负的，因为这些数是0到1之间的，他们的对数是负的；（2）整个式子的前面有一个符号。

其次，如果对于所有的训练输入x，这个神经元的实际输出值都能很接近我们期待的输出的话，那么交叉熵将会非常接近0。为了说明这个，假设有一些输入样例x得到的输出是y=0，a≈0。这些都是一些比较好的输出。我们会发现等式的第一项将会消掉，因为y=0，与此同时，第二项−ln（1−a）≈0 。同理，当y=1或a≈1时也是如此分析。那么如果我们的实际输出接近期望输出的话代价函数的分布就会很低。

总结一下，交叉熵是正的，并且当所有输入x的输出都能接近期望输出y的话，交叉熵的值将会接近0。这两个特点在直觉上我们都会觉得它适合做代价函数。事实上，我们的均方代价函数也同时满足这两个特点。这对于交叉熵来说是一个好消息。而且交叉熵有另一个均方代价函数不具备的特征，它能够避免学习速率降低的情况。

上面介绍了交叉熵可以作为Loss函数的原因，首先是交叉熵得到的值一定是正数，其次是预测结果越准确值越小，注意公式中用于计算的‘a’也是经过sigmoid激活的，取值范围在0到1.

这里多次强调sigmoid激活函数，是因为在多目标或者多分类的问题下有些函数是不可用的，而TensorFlow本身也提供了多种交叉熵算法的实现。
TensorFlow的四个交叉熵函数：

tf.nn.sigmoid_cross_entropy_with_logitstf.nn.softmax_cross_entropy_with_logitstf.nn.sparse_softmax_cross_entropy_with_logitstf.nn.weighted_cross_entropy_with_logits

sigmoid_cross_entropy_with_logits详解
查看：http://www.tensorfly.cn/tfdoc/api_docs/python/nn.html#sigmoid_cross_entropy_with_logits

tf.nn.sigmoid_cross_entropy_with_logits(logits, targets, name=None)Computes sigmoid cross entropy given logits.Measures the probability error in discrete classification tasks in which each class is independent and not mutually exclusive. For instance, one could perform multilabel classification where a picture can contain both an elephant and a dog at the same time.For brevity, let x = logits, z = targets. The logistic loss isx - x * z + log(1 + exp(-x))To ensure stability and avoid overflow, the implementation usesmax(x, 0) - x * z + log(1 + exp(-abs(x)))logits and targets must have the same type and shape.Args:logits: A Tensor of type float32 or float64.targets: A Tensor of the same type and shape as logits.name: A name for the operation (optional).Returns:A Tensor of the same shape as logits with the componentwise logistic losses.

我们先看sigmoid_cross_entropy_with_logits，为什么呢，因为它的实现和前面的交叉熵算法定义是一样的，也是TensorFlow最早实现的交叉熵算法。这个函数的输入是logits和targets，logits就是神经网络模型中的 W * X矩阵，注意不需要经过sigmoid，而targets的shape和logits相同，就是正确的label值。

例如这个模型一次要判断100张图是否包含10种动物，这两个输入的shape都是[100, 10]。注释中还提到这10个分类之间是独立的、不要求是互斥，这种问题我们成为多目标，例如判断图片中是否包含10种动物，label值可以包含多个1或0个1，还有一种问题是多分类问题，例如我们对年龄特征分为5段，只允许5个值有且只有1个值为1，这种问题可以直接用这个函数吗？答案是不可以，我们先来看看sigmoid_cross_entropy_with_logits的代码实现吧。

这里写图片描述

可以看到这就是标准的Cross Entropy算法实现，对W * X得到的值进行sigmoid激活，保证取值在0到1之间，然后放在交叉熵的函数中计算Loss。对于二分类问题这样做没问题，但对于前面提到的多分类，例如年轻取值范围在0~4，目标值也在0~4，这里如果经过sigmoid后预测值就限制在0到1之间，而且公式中的1 - z就会出现负数，仔细想一下0到4之间还不存在线性关系，如果直接把label值带入计算肯定会有非常大的误差。因此对于多分类问题是不能直接代入的，那其实我们可以灵活变通，把5个年龄段的预测用onehot encoding变成5维的label，训练时当做5个不同的目标来训练即可，但不保证只有一个为1，对于这类问题TensorFlow又提供了基于Softmax的交叉熵函数。

softmax_cross_entropy_with_logits详解
参看：http://www.tensorfly.cn/tfdoc/api_docs/python/nn.html#softmax_cross_entropy_with_logits

Softmax本身的算法很简单，就是把所有值用e的n次方计算出来，求和后算每个值占的比率，保证总和为1，一般我们可以认为Softmax出来的就是confidence也就是概率，算法实现如下。
这里写图片描述

softmax_cross_entropy_with_logits和sigmoid_cross_entropy_with_logits很不一样，输入是类似的logits和lables的shape一样，但这里要求分类的结果是互斥的，保证只有一个字段有值，例如CIFAR-10中图片只能分一类而不像前面判断是否包含多类动物。想一下问什么会有这样的限制？在函数头的注释中我们看到，这个函数传入的logits是unscaled的，既不做sigmoid也不做softmax，因为函数实现会在内部更高效得使用softmax，对于任意的输入经过softmax都会变成和为1的概率预测值，这个值就可以代入变形的Cross Entroy算法- y * ln(a) - (1 - y) * ln(1 - a)算法中，得到有意义的Loss值了。如果是多目标问题，经过softmax就不会得到多个和为1的概率，而且label有多个1也无法计算交叉熵，因此这个函数只适合单目标的二分类或者多分类问题。TensorFlow函数定义如下。

tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)Computes softmax cross entropy between logits and labels.Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.logits and labels must have the same shape [batch_size, num_classes] and the same dtype (either float32 or float64).Args:logits: Unscaled log probabilities.labels: Each row labels[i] must be a valid probability distribution.name: A name for the operation (optional).Returns:A 1-D Tensor of length batch_size of the same type as logits with the softmax cross entropy loss.

再补充一点，对于多分类问题，例如我们的年龄分为5类，并且人工编码为0、1、2、3、4，因为输出值是5维的特征，因此我们需要人工做onehot encoding分别编码为00001、00010、00100、01000、10000，才可以作为这个函数的输入。理论上我们不做onehot encoding也可以，做成和为1的概率分布也可以，但需要保证是和为1，和不为1的实际含义不明确，TensorFlow的C++代码实现计划检查这些参数，可以提前提醒用户避免误用。

sparse_softmax_cross_entropy_with_logits详解

sparse_softmax_cross_entropy_with_logits是softmax_cross_entropy_with_logits的易用版本，除了输入参数不同，作用和算法实现都是一样的。前面提到softmax_cross_entropy_with_logits的输入必须是类似onehot encoding的多维特征，但CIFAR-10、ImageNet和大部分分类场景都只有一个分类目标，label值都是从0编码的整数，每次转成onehot encoding比较麻烦，有没有更好的方法呢？答案就是用sparse_softmax_cross_entropy_with_logits，它的第一个参数logits和前面一样，shape是[batch_size, num_classes]，而第二个参数labels以前也必须是[batch_size, num_classes]否则无法做Cross Entropy，这个函数改为限制更强的[batch_size]，而值必须是从0开始编码的int32或int64，而且值范围是[0, num_class)，如果我们从1开始编码或者步长大于1，会导致某些label值超过这个范围，代码会直接报错退出。这也很好理解，TensorFlow通过这样的限制才能知道用户传入的3、6或者9对应是哪个class，最后可以在内部高效实现类似的onehot encoding，这只是简化用户的输入而已，如果用户已经做了onehot encoding那可以直接使用不带“sparse”的softmax_cross_entropy_with_logits函数。

weighted_sigmoid_cross_entropy_with_logits详解

weighted_sigmoid_cross_entropy_with_logits是sigmoid_cross_entropy_with_logits的拓展版，输入参数和实现和后者差不多，可以多支持一个pos_weight参数，目的是可以增加或者减小正样本在算Cross Entropy时的Loss。实现原理很简单，在传统基于sigmoid的交叉熵算法上，正样本算出的值乘以某个系数接口，算法实现如下。
这里写图片描述

总结

这就是TensorFlow目前提供的有关Cross Entropy的函数实现，用户需要理解多目标和多分类的场景，根据业务需求（分类目标是否独立和互斥）来选择基于sigmoid或者softmax的实现，如果使用sigmoid目前还支持加权的实现，如果使用softmax我们可以自己做onehot coding或者使用更易用的sparse_softmax_cross_entropy_with_logits函数。

TensorFlow提供的Cross Entropy函数基本cover了多目标和多分类的问题，但如果同时是多目标多分类的场景，肯定是无法使用softmax_cross_entropy_with_logits，如果使用sigmoid_cross_entropy_with_logits我们就把多分类的特征都认为是独立的特征，而实际上他们有且只有一个为1的非独立特征，计算Loss时不如Softmax有效。这里可以预测下，未来TensorFlow社区将会实现更多的op解决类似的问题，我们也期待更多人参与TensorFlow贡献算法和代码 :)

阅读全文

0 0