Hinton Neural Networks课程笔记quiz 3：带有隐层的非线性神经网络不一定能学习线性函数

来源：互联网发布：淘宝店铺招牌怎么做编辑：程序博客网时间：2024/04/29 20:35

这是做quiz 3遇到的问题，在论坛里面找到了差不多的答案，虽然有些文字游戏的意思，但确实是知识盲点，记录在此。

问题

Consider a neural network with one layer of logistic hidden units (intended to be fully connected to the input units) and a linear output unit. Suppose there are n input units and m hidden units. Which of the following statements are true? Check all that apply.

选项

As long as m≥1, this network can learn to compute any function that can be learned by a network without any hidden layers (with the same inputs).

答案与解析

[ This should not be selected ]

If the weights into the hidden layer are very small, and the weights out of it are large (to compensate), then the hidden units behave like linear units, which makes lots of things possible.

解析详解

这个官方的解析明显是支持select的，但是答案是not selected。所以笔者带着疑问去了论坛，找到了答案是虽然可以按照解析的方法模拟线性函数，但并不完全一致，同时也对输入的范围作了限制（如果输入过大，也难以模拟）。
这里有一段讨论：带有非线性隐层的神经网络的能力应该比单层线性神经元强，为什么前者缺学习不到后者的函数。贴在下面：

Michael Dougherty
Thanks for responding. I already retook the test, so I don’t have access to my original question, but I’m pretty sure it was about logistic hidden neurons. Even so, the answer expected here is implying that a network with a layer of at least one hidden neuron (logistic or not) is LESS capable of learning than a network without a hidden layer. Is that true?
Boris Knyazev
In my opinion, it’s not true and the answer does not imply that “that a network with a layer of at least one hidden neuron (logistic or not) is LESS capable of learning than a network without a hidden layer”.
As you probably know, the network with a logistic hidden layer will perform nonlinear transformation of data. In case the weights of this layer are close to zero it will be approximately linear transformation, but still nonlinear.
Now, you think that if the network is nonlinear, then it can learn to compute more functions than a linear network. It is true indeed. But “more” does not imply “any”. For example, this nonlinear network will not be able to learn to compute a linear function.
So, it’s not less capable, it’s very capable, but that can be a disadvantage sometimes. Therefore, for some tasks a linear network can do a better job. We can also do early stopping to prevent the model to be too nonlinear.

这里面区分了能够解决某类问题，和能够学习某种函数。同时也直接说明指定隐层数目的非线性神经网络不能够学习线性函数（笔者仅在带有指定隐层数目前提的情况下同意；对于不限制隐层数目的情况，笔者印象中记得有理论证明隐层激活函数为sigmoid的二层神经网络可以学习任意函数，但不确定是approximate还是learn，以及应该对输入输出都做了有理数域的限定（即可以用电脑二进制表示））。进一步，笔者考虑带有正则项的情况，只具有几个隐层节点个数的二层非线性神经网络去学习一个线性函数，最终的结果不一定足够好。这一是提醒了正则项的应用不要过分，其次不要认为复杂的模型是无敌的。

阅读全文

0 0