深度学习进阶（六）--CNN卷积神经网络调试，错误历程总结

来源：互联网发布：nginx 访问权限编辑：程序博客网时间：2024/06/11 11:48

总结一下今天的学习过程

（注：我此刻的心情与刚刚三分钟前的心情是完全不一样的）

（昨天在想一些错误，今天又重拾信心重新配置GPU环境，结果很失败，不过现在好了，在寻思着今天干了什么的时候，无意间想到是不是自己方法入口不对啊。结果果然很幸运的被我猜到了，，，哈哈哈，我的心情又好了）

总结最大的体会：有的时候在代码不能运行的时候，可是尝试先看看学习代码，

起码从代码的入口调用看起、看清、看准，不要整个全网基本不会出现的错误让自己造出来了，更重要的是自己看不懂这个错误

哎崩溃的一天

总结一下昨天失败的原因

1，编码问题

使用TextEncoding.exe文件将E:\Python\CRDA\cuda8\include中编码改成Unicode

2，什么出现宏定义啥的，解决的办法忘了，明天或许会帮同学重装，到时再补充

3，cuda9与cuda8再vs2015中冲突，vs2015中使用的的是cuda9，将其cuda9卸载和cuda8都卸载，重新安装cuda8

4，使用vs2015编译cudaruntime程序，并运行，生成一些所谓的必要文件，我也是傻懵的，结果少了很多错误，最好在64位和32位都debug编译

5，在编译cudaruntime程序时，会出现FIB类似的文件打不开或者无法找到，这些都可以在百度中寻找答案，建议：不要在意有关nvxxxx.dll文件找不到无法加载，然后自己又去DOS窗口使用regsvr32 C:WindowsSysWOW64\nvcuda.dll等等，这是非常愚蠢的，，

会出现一个已加载但找不到入口点DLLRegisterServer

这个可以尽情的百度吧、谷歌吧，结果你会奔溃的，除非你对dll动态库或者C特别熟，否则你会干耗死在这

只要不是nvxxx.dll的都可以忽略

6，其他的一些代码问题，百度解决一下即可

其中关于downsample与pool_2d，由于Python的版本不同，Python3中被替换

pooled_out = pool_2d(input=conv_out, ws=self.poolsize, ignore_border=True)这里是ws不是ds，不然会有一个警告

UserWarning: DEPRECATION: the 'ds' parameter is not going to exist anymore as it is going to be replaced by the parameter 'ws'.
pooled_out = pool_2d(input=conv_out, ds=self.poolsize, ignore_border=True)

conv2d这个库导入也要注意，有时候

from theano.tensor.nnet import conv2d

conv_out = conv2d(
input=self.inpt, filters=self.w, filter_shape=self.filter_shape,
input_shape=self.image_shape)

7，还有一个严重的错误是关于cudnn版本的

在cudnn-8.0-windows7-x64-v7中有一个XXX_DV4，类似cudnnGetPoolingNdDescriptor_v4的错误，说是找不到此文件，这里要将cudnn-8.0-windows7-x64-v7替换成cudnn-8.0-windows7-x64-v5.1，警告也解决了，错误也解决了

其他的错误一时半会想不出来了，反正就是太多太多了

最后说一下今天的最大的失误愚蠢

起码从代码的入口调用看起、看清、看准，不要整个全网基本不会出现的错误让自己造出来了，更重要的是自己看不懂这个错误

造成的错误：

Trying to run under a GPU.  If this is not desired, then modify network3.pyto set the GPU flag to False.(<CudaNdarrayType(float32, matrix)>, Elemwise{Cast{int32}}.0)cost: Elemwise{add,no_inplace}.0grads: [Elemwise{add,no_inplace}.0, GpuFromHost.0, Elemwise{add,no_inplace}.0, GpuFromHost.0, Elemwise{add,no_inplace}.0, GpuFromHost.0]updates: [(<CudaNdarrayType(float32, 4D)>, Elemwise{sub,no_inplace}.0), (<CudaNdarrayType(float32, vector)>, Elemwise{sub,no_inplace}.0), (w, Elemwise{sub,no_inplace}.0), (b, Elemwise{sub,no_inplace}.0), (w, Elemwise{sub,no_inplace}.0), (b, Elemwise{sub,no_inplace}.0)]train_mb: <theano.compile.function_module.Function object at 0x000000002290B0B8>Training mini-batch number 0Traceback (most recent call last):  File "E:\Python\Anaconda3\lib\site-packages\theano\compile\function_module.py", line 884, in __call__    self.fn() if output_subset is None else\ValueError: GpuReshape: cannot reshape input of shape (10, 20, 12, 12) to shape (10, 784).During handling of the above exception, another exception occurred:Traceback (most recent call last):  File "E:\Python\NewPythonData\neural-networks-and-deep-learning\src\demo3.py", line 38, in <module>    net.SGD(training_data,10,mini_batch_size,0.1,validation_data,test_data)  File "E:\Python\NewPythonData\neural-networks-and-deep-learning\src\network3.py", line 179, in SGD    train_mb(minibatch_index)  File "E:\Python\Anaconda3\lib\site-packages\theano\compile\function_module.py", line 898, in __call__    storage_map=getattr(self.fn, 'storage_map', None))  File "E:\Python\Anaconda3\lib\site-packages\theano\gof\link.py", line 325, in raise_with_op    reraise(exc_type, exc_value, exc_trace)  File "E:\Python\Anaconda3\lib\site-packages\six.py", line 685, in reraise    raise value.with_traceback(tb)  File "E:\Python\Anaconda3\lib\site-packages\theano\compile\function_module.py", line 884, in __call__    self.fn() if output_subset is None else\ValueError: GpuReshape: cannot reshape input of shape (10, 20, 12, 12) to shape (10, 784).Apply node that caused the error: GpuReshape{2}(GpuElemwise{add,no_inplace}.0, TensorConstant{[ 10 784]})Toposort index: 51Inputs types: [CudaNdarrayType(float32, 4D), TensorType(int32, vector)]Inputs shapes: [(10, 20, 12, 12), (2,)]Inputs strides: [(2880, 144, 12, 1), (4,)]Inputs values: ['not shown', array([ 10, 784])]Outputs clients: [[GpuElemwise{Composite{(scalar_sigmoid(i0) * i1)},no_inplace}(GpuReshape{2}.0, GpuFromHost.0)]]HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

错误具体我这里就不说了，

如果出现类似的错误，希望朋友们重头确认一下，自己输入的卷积层、采样层、全连接层的参数是否错误

改正后，结果非常激动

Trying to run under a GPU.  If this is not desired, then modify network3.pyto set the GPU flag to False.(<CudaNdarrayType(float32, matrix)>, Elemwise{Cast{int32}}.0)cost: Elemwise{add,no_inplace}.0grads: [Elemwise{add,no_inplace}.0, GpuFromHost.0, Elemwise{add,no_inplace}.0, GpuFromHost.0, Elemwise{add,no_inplace}.0, GpuFromHost.0]updates: [(<CudaNdarrayType(float32, 4D)>, Elemwise{sub,no_inplace}.0), (<CudaNdarrayType(float32, vector)>, Elemwise{sub,no_inplace}.0), (w, Elemwise{sub,no_inplace}.0), (b, Elemwise{sub,no_inplace}.0), (w, Elemwise{sub,no_inplace}.0), (b, Elemwise{sub,no_inplace}.0)]train_mb: <theano.compile.function_module.Function object at 0x0000000022A18208>Training mini-batch number 0Training mini-batch number 1000Training mini-batch number 2000Training mini-batch number 3000Training mini-batch number 4000Epoch 0: validation accuracy 93.73%This is the best validation accuracy to date.The corresponding test accuracy is 93.20%

scikit-neuralnetwork中的examples/plot_mlp.py

测试技测试成功

cd 到scikit-neuralnetwork源目录下（从GitHub上获取scikit-neuralnetwork源码https://github.com/aigamedev/scikit-neuralnetwork）

python examples/plot_mlp.py --params activation

结果

阅读全文

0 0