交叉熵是否非负?

来源:互联网 发布:淘宝怎么设置客服接待 编辑:程序博客网 时间:2024/05/17 01:57

在Ian Goodfellow那本Deep Learning Book中有这样一段描述:

One unusual property of the cross-entropy cost used to perform maximum likelihood estimation is that it usually does not have a minimum value when applied to the models commonly used in practice. For discrete output variables, most models are parametrized in such a way that they cannot represent a probability of zero or one, but can come arbitrarily close to doing so. Logistic regression is an example of such a model. For real-valued output variables, if the model can control the density of the output distribution (for example, by learning the variance parameter of a Gaussian output distribution) then it becomes possible to assign extremely high density to the correct training set outputs, resulting in cross-entropy approaching negative infinity.
http://www.deeplearningbook.org/contents/mlp.html p.175

第一次看的时候很是迷惑,好像从来没有见过负的交叉熵啊。我们知道交叉熵可以写成熵和KL散度之和:

H(p;q)=H(p)+KL(p||q)

其中KL(p||q) 是非负的。证明很简单:
KL(p||q)=Ep[log(pq)]=Ep[log(qp)]log(Ep[qp])=log(pqp)=0

中间的大于等于号来自于詹森不等式。

对于H(p), 隐约记得熵应该也是非负的:

H(x)=p(x)log(1p(x))

p是概率,必定在[0, 1], log(1p(x)) 肯定大于0,H(p)就一定大于0了。

这样讲,交叉熵一定是非负了,怎么可能有negative infinity呢?

再仔细读一遍上面出现negative infinity的那句话。

For real-valued output variables, if the model can control the density of the output distribution (for example, by learning the variance parameter of a Gaussian output distribution) then it becomes possible to assign extremely high density to the correct training set outputs, resulting in cross-entropy approaching negative infinity.

原来重点在real-valued output variables. 对于连续随机变量来说,熵的定义要写成积分形式:

H(x)=xp(x)log(1p(x))

这里的p就变成了概率密度,取值范围变成了[0,+),这个积分可就不一定有下界了。极端情况下,p(x)是个Dirac delta function, 这个积分就是负无穷了。

因此对于,连续随机变量,熵有可能是负的。

我们再看,交叉熵的定义:

H(p,q)=xp(x)log(1q(x))

我们同样假定q(x)是个Dirac delta function,那么这个交叉熵也就变成了负无穷。

这也是上面那段话后半段的应有之义。

it becomes possible to assign extremely high density to the correct training set outputs, resulting in cross-entropy approaching negative infinity.

因此,我们的结论是:

  • 对于离散随机变量,交叉熵是非负的。如果你的分类问题是softmax + cross_entropy_loss 出现了负的loss,那肯定是算错了。
  • 对于连续随机变量,交叉熵有可能是负。
原创粉丝点击