Ubuntu14.04+Theano+OpenCL+libgpuarray实现GPU运算
来源:互联网 发布:设置maven的java home 编辑:程序博客网 时间:2024/06/07 02:42
本文转自:http://blog.csdn.net/majordong100/article/details/51859994
博客已经迁移至Marcovaldo’s blog (http://marcovaldong.github.io/)
上一篇博客介绍了如何使用Theano+logistic regression来实现kaggle上的数字手写识别,文末提到了CPU计算实在太慢,因此在做完这个实验之后,博主查阅了Theano的文档,了解到Theano官方仅支持CUDA进行GPU运算,不支持OpenCL,也就是说Theano官方仅支持N卡。原因是,CUDA和OpenCL是两个GPU计算平台,CUDA仅支持N卡,OpenCL支持所有的显卡,二者的具体区别还请自行查询。无奈博主的笔记本有一张intel的集成显卡和AMD的一张入门独显,而Theano非官方的提供了libgpuarray来支持OpenCL,因此博主花了大量的时间来尝试安装libgpuarray。
libgpuarray支持的OS有Debian6,Ubuntu14.04,MAC OS X10.11和win7,而网上能找到的成功安装libgpuarray的只有两篇博文,全是在MAC OS上,下面给出博文链接,供后面的同学参考:
https://www.robberphex.com/2016/05/521
http://codechina.org/2016/04/how-to-install-theano-on-mac-os-x-ei-caption-with-opencl-support/
博主的最初OS是win7,整个6月的空闲时间几乎都用在安装libgpuarray上了,遇到了无数个坑,然并卵,最终也没能成功。这里列出在win7上安装libgpuarray需要的一些环境,供后面的同学参考:
- 最新的AMD显卡驱动,具体可前往AMD官网查询
- AMD APP SDK,其提供了OpenCL
- Cmake >= 3.0 (cmake)
- g++,一般我们可以通过wingw或TDW-GCC来安装
- visual studio
- clBLAS (clblas)
- libcheck
7月份在win7上装了Ubuntu14.04的双系统,尝试在Ubuntu上实现Theano+OpenCL的GPU运算,最终libgpuarray算是安装成功吧,只是还不能用A卡来计算,具体问题文末介绍。下面介绍整个过程。
安装Ubuntu14.04双系统
我的win7/Ubuntu14.04双系统安装过程参考了http://m.blog.csdn.net/article/details?id=43987599 这篇博文比较简单,这里不再展开。
安装AMD显卡驱动
博主开始是死在了这里,AMD驱动装坏了好几次,装坏了的结果就是重启后不能进入图形界面。然后只能在tty或者initramfs进行修复,这对于博主这种第一次接触linux的人来说太困难了,往往修复好了还是不能用,只好重装系统,整个过程重装了七八次。这里我介绍一种安装驱动的方法,比较简单快速(至少我是一次就成功了)。
在安装好Ubuntu14.04之后,第一件事就是换驱动。找到附加驱动,如下图所示,系统初始使用的驱动是开源的,我们选择来自fglrx的专有驱动,然后点击“应用更改”按钮,静静的等它装完重启。
重启后打开终端,输入fglrxinfo,终端会返回显卡信息,如下所示:
<code class="hljs d has-numbering">marcovaldo<span class="hljs-keyword">@marcovaldong</span>:~$ fglrxinfodisplay: :<span class="hljs-number">0</span> screen: <span class="hljs-number">0</span>OpenGL vendor <span class="hljs-built_in">string</span>: Advanced Micro Devices, Inc.OpenGL renderer <span class="hljs-built_in">string</span>: AMD Radeon HD <span class="hljs-number">7400</span>M SeriesOpenGL <span class="hljs-keyword">version</span> <span class="hljs-built_in">string</span>: <span class="hljs-number">4.5</span>.13399 Compatibility Profile Context <span class="hljs-number">15.201</span>.1151</code>
再在终端输入fgl_glxgears,会跳出一个测试窗口(旋转的方块),这就证明显卡驱动安装成功。这里,博主找到了安装驱动的比较好的方法,供后面的同学参考。
http://forum.ubuntu.org.cn/viewtopic.php?t=445434
http://www.tuicool.com/articles/6N3e2ir
安装AMD APP SDK
前往AMD官网下载SDK(注意OS和位数),我这里下载的是Linux64位版AMD APP SDK 3.0。文件解压后出现一个.sh文件,终端输入命令
<code class="hljs lasso has-numbering">sudo sh AMD<span class="hljs-attribute">-APP</span><span class="hljs-attribute">-SDK</span><span class="hljs-attribute">-v3</span><span class="hljs-number">.0</span><span class="hljs-number">.130</span><span class="hljs-number">.136</span><span class="hljs-attribute">-GA</span><span class="hljs-attribute">-linux64</span><span class="hljs-built_in">.</span>sh</code><ul style="" class="pre-numbering"><li>1</li></ul>
AMDSDK默认会安装在/opt/下,这时候在终端输入clinfo命令会返回OpenCL平台信息和计算设备信息,下面给出我的笔记本的数据:
<code class="hljs livecodeserver has-numbering">marcovaldo@marcovaldong:~$ clinfoNumber <span class="hljs-operator">of</span> platforms: <span class="hljs-number">1</span> Platform Profile: FULL_PROFILE Platform Version: OpenCL <span class="hljs-number">2.0</span> AMD-APP (<span class="hljs-number">1800.11</span>) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices Platform Name: AMD Accelerated Parallel ProcessingNumber <span class="hljs-operator">of</span> devices: <span class="hljs-number">2</span> Device Type: CL_DEVICE_TYPE_GPU Vendor ID: <span class="hljs-number">1002</span>h Board name: AMD Radeon HD <span class="hljs-number">7400</span>M Series Device Topology: PCI[ B<span class="hljs-comment">#1, D#0, F#0 ]</span> Max compute units: <span class="hljs-number">2</span> Max work <span class="hljs-keyword">items</span> dimensions: <span class="hljs-number">3</span> Max work <span class="hljs-keyword">items</span>[<span class="hljs-number">0</span>]: <span class="hljs-number">256</span> Max work <span class="hljs-keyword">items</span>[<span class="hljs-number">1</span>]: <span class="hljs-number">256</span> Max work <span class="hljs-keyword">items</span>[<span class="hljs-number">2</span>]: <span class="hljs-number">256</span> Max work group size: <span class="hljs-number">256</span> Preferred vector width <span class="hljs-keyword">char</span>: <span class="hljs-number">16</span> Preferred vector width <span class="hljs-keyword">short</span>: <span class="hljs-number">8</span> Preferred vector width int: <span class="hljs-number">4</span> Preferred vector width <span class="hljs-keyword">long</span>: <span class="hljs-number">2</span> Preferred vector width float: <span class="hljs-number">4</span> Preferred vector width double: <span class="hljs-number">0</span> Native vector width <span class="hljs-keyword">char</span>: <span class="hljs-number">16</span> Native vector width <span class="hljs-keyword">short</span>: <span class="hljs-number">8</span> Native vector width int: <span class="hljs-number">4</span> Native vector width <span class="hljs-keyword">long</span>: <span class="hljs-number">2</span> Native vector width float: <span class="hljs-number">4</span> Native vector width double: <span class="hljs-number">0</span> Max clock frequency: <span class="hljs-number">700</span>Mhz Address bits: <span class="hljs-number">32</span> Max memory allocation: <span class="hljs-number">134217728</span> Image support: Yes Max <span class="hljs-built_in">number</span> <span class="hljs-operator">of</span> images <span class="hljs-built_in">read</span> arguments: <span class="hljs-number">128</span> Max <span class="hljs-built_in">number</span> <span class="hljs-operator">of</span> images <span class="hljs-built_in">write</span> arguments: <span class="hljs-number">8</span> Max image <span class="hljs-number">2</span>D width: <span class="hljs-number">16384</span> Max image <span class="hljs-number">2</span>D height: <span class="hljs-number">16384</span> Max image <span class="hljs-number">3</span>D width: <span class="hljs-number">2048</span> Max image <span class="hljs-number">3</span>D height: <span class="hljs-number">2048</span> Max image <span class="hljs-number">3</span>D depth: <span class="hljs-number">2048</span> Max samplers <span class="hljs-operator">within</span> kernel: <span class="hljs-number">16</span> Max size <span class="hljs-operator">of</span> kernel argument: <span class="hljs-number">1024</span> Alignment (bits) <span class="hljs-operator">of</span> base address: <span class="hljs-number">2048</span> Minimum alignment (<span class="hljs-keyword">bytes</span>) <span class="hljs-keyword">for</span> <span class="hljs-keyword">any</span> datatype: <span class="hljs-number">128</span> Single precision floating point capability Denorms: No Quiet NaNs: Yes Round <span class="hljs-built_in">to</span> nearest even: Yes Round <span class="hljs-built_in">to</span> <span class="hljs-constant">zero</span>: Yes Round <span class="hljs-built_in">to</span> +ve <span class="hljs-operator">and</span> infinity: Yes IEEE754-<span class="hljs-number">2008</span> fused <span class="hljs-built_in">multiply</span>-<span class="hljs-built_in">add</span>: Yes Cache type: None Cache <span class="hljs-built_in">line</span> size: <span class="hljs-number">0</span> Cache size: <span class="hljs-number">0</span> Global memory size: <span class="hljs-number">536870912</span> Constant buffer size: <span class="hljs-number">65536</span> Max <span class="hljs-built_in">number</span> <span class="hljs-operator">of</span> <span class="hljs-built_in">constant</span> args: <span class="hljs-number">8</span> Local memory type: Scratchpad Local memory size: <span class="hljs-number">32768</span> Max pipe arguments: <span class="hljs-number">0</span> Max pipe active reservations: <span class="hljs-number">0</span> Max pipe packet size: <span class="hljs-number">0</span> Max <span class="hljs-built_in">global</span> <span class="hljs-built_in">variable</span> size: <span class="hljs-number">0</span> Max <span class="hljs-built_in">global</span> <span class="hljs-built_in">variable</span> preferred total size: <span class="hljs-number">0</span> Max <span class="hljs-built_in">read</span>/<span class="hljs-built_in">write</span> image args: <span class="hljs-number">0</span> Max <span class="hljs-command"><span class="hljs-keyword">on</span> <span class="hljs-title">device</span> <span class="hljs-title">events</span>: <span class="hljs-title">0</span></span> Queue <span class="hljs-command"><span class="hljs-keyword">on</span> <span class="hljs-title">device</span> <span class="hljs-title">max</span> <span class="hljs-title">size</span>: <span class="hljs-title">0</span></span> Max <span class="hljs-command"><span class="hljs-keyword">on</span> <span class="hljs-title">device</span> <span class="hljs-title">queues</span>: <span class="hljs-title">0</span></span> Queue <span class="hljs-command"><span class="hljs-keyword">on</span> <span class="hljs-title">device</span> <span class="hljs-title">preferred</span> <span class="hljs-title">size</span>: <span class="hljs-title">0</span></span> SVM capabilities: Coarse grain buffer: No Fine grain buffer: No Fine grain <span class="hljs-keyword">system</span>: No Atomics: No Preferred <span class="hljs-built_in">platform</span> atomic alignment: <span class="hljs-number">0</span> Preferred <span class="hljs-built_in">global</span> atomic alignment: <span class="hljs-number">0</span> Preferred <span class="hljs-built_in">local</span> atomic alignment: <span class="hljs-number">0</span> Kernel Preferred work group size multiple: <span class="hljs-number">64</span> Error correction support: <span class="hljs-number">0</span> Unified memory <span class="hljs-keyword">for</span> Host <span class="hljs-operator">and</span> Device: <span class="hljs-number">0</span> Profiling timer resolution: <span class="hljs-number">1</span> Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native <span class="hljs-function"><span class="hljs-keyword">function</span>: <span class="hljs-title">No</span></span> Queue <span class="hljs-command"><span class="hljs-keyword">on</span> <span class="hljs-title">Host</span> <span class="hljs-title">properties</span>: </span> Out-<span class="hljs-operator">of</span>-Order: No Profiling : Yes Queue <span class="hljs-command"><span class="hljs-keyword">on</span> <span class="hljs-title">Device</span> <span class="hljs-title">properties</span>: </span> Out-<span class="hljs-operator">of</span>-Order: No Profiling : No Platform ID: <span class="hljs-number">0x7f98e6833430</span> Name: Caicos Vendor: Advanced Micro Devices, Inc. Device OpenCL C <span class="hljs-built_in">version</span>: OpenCL C <span class="hljs-number">1.2</span> Driver <span class="hljs-built_in">version</span>: <span class="hljs-number">1800.11</span> Profile: FULL_PROFILE Version: OpenCL <span class="hljs-number">1.2</span> AMD-APP (<span class="hljs-number">1800.11</span>) Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event Device Type: CL_DEVICE_TYPE_CPU Vendor ID: <span class="hljs-number">1002</span>h Board name: Max compute units: <span class="hljs-number">4</span> Max work <span class="hljs-keyword">items</span> dimensions: <span class="hljs-number">3</span> Max work <span class="hljs-keyword">items</span>[<span class="hljs-number">0</span>]: <span class="hljs-number">1024</span> Max work <span class="hljs-keyword">items</span>[<span class="hljs-number">1</span>]: <span class="hljs-number">1024</span> Max work <span class="hljs-keyword">items</span>[<span class="hljs-number">2</span>]: <span class="hljs-number">1024</span> Max work group size: <span class="hljs-number">1024</span> Preferred vector width <span class="hljs-keyword">char</span>: <span class="hljs-number">16</span> Preferred vector width <span class="hljs-keyword">short</span>: <span class="hljs-number">8</span> Preferred vector width int: <span class="hljs-number">4</span> Preferred vector width <span class="hljs-keyword">long</span>: <span class="hljs-number">2</span> Preferred vector width float: <span class="hljs-number">8</span> Preferred vector width double: <span class="hljs-number">4</span> Native vector width <span class="hljs-keyword">char</span>: <span class="hljs-number">16</span> Native vector width <span class="hljs-keyword">short</span>: <span class="hljs-number">8</span> Native vector width int: <span class="hljs-number">4</span> Native vector width <span class="hljs-keyword">long</span>: <span class="hljs-number">2</span> Native vector width float: <span class="hljs-number">8</span> Native vector width double: <span class="hljs-number">4</span> Max clock frequency: <span class="hljs-number">2299</span>Mhz Address bits: <span class="hljs-number">64</span> Max memory allocation: <span class="hljs-number">2147483648</span> Image support: Yes Max <span class="hljs-built_in">number</span> <span class="hljs-operator">of</span> images <span class="hljs-built_in">read</span> arguments: <span class="hljs-number">128</span> Max <span class="hljs-built_in">number</span> <span class="hljs-operator">of</span> images <span class="hljs-built_in">write</span> arguments: <span class="hljs-number">64</span> Max image <span class="hljs-number">2</span>D width: <span class="hljs-number">8192</span> Max image <span class="hljs-number">2</span>D height: <span class="hljs-number">8192</span> Max image <span class="hljs-number">3</span>D width: <span class="hljs-number">2048</span> Max image <span class="hljs-number">3</span>D height: <span class="hljs-number">2048</span> Max image <span class="hljs-number">3</span>D depth: <span class="hljs-number">2048</span> Max samplers <span class="hljs-operator">within</span> kernel: <span class="hljs-number">16</span> Max size <span class="hljs-operator">of</span> kernel argument: <span class="hljs-number">4096</span> Alignment (bits) <span class="hljs-operator">of</span> base address: <span class="hljs-number">1024</span> Minimum alignment (<span class="hljs-keyword">bytes</span>) <span class="hljs-keyword">for</span> <span class="hljs-keyword">any</span> datatype: <span class="hljs-number">128</span> Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round <span class="hljs-built_in">to</span> nearest even: Yes Round <span class="hljs-built_in">to</span> <span class="hljs-constant">zero</span>: Yes Round <span class="hljs-built_in">to</span> +ve <span class="hljs-operator">and</span> infinity: Yes IEEE754-<span class="hljs-number">2008</span> fused <span class="hljs-built_in">multiply</span>-<span class="hljs-built_in">add</span>: Yes Cache type: Read/Write Cache <span class="hljs-built_in">line</span> size: <span class="hljs-number">64</span> Cache size: <span class="hljs-number">32768</span> Global memory size: <span class="hljs-number">6161788928</span> Constant buffer size: <span class="hljs-number">65536</span> Max <span class="hljs-built_in">number</span> <span class="hljs-operator">of</span> <span class="hljs-built_in">constant</span> args: <span class="hljs-number">8</span> Local memory type: Global Local memory size: <span class="hljs-number">32768</span> Max pipe arguments: <span class="hljs-number">16</span> Max pipe active reservations: <span class="hljs-number">16</span> Max pipe packet size: <span class="hljs-number">2147483648</span> Max <span class="hljs-built_in">global</span> <span class="hljs-built_in">variable</span> size: <span class="hljs-number">1879048192</span> Max <span class="hljs-built_in">global</span> <span class="hljs-built_in">variable</span> preferred total size: <span class="hljs-number">1879048192</span> Max <span class="hljs-built_in">read</span>/<span class="hljs-built_in">write</span> image args: <span class="hljs-number">64</span> Max <span class="hljs-command"><span class="hljs-keyword">on</span> <span class="hljs-title">device</span> <span class="hljs-title">events</span>: <span class="hljs-title">0</span></span> Queue <span class="hljs-command"><span class="hljs-keyword">on</span> <span class="hljs-title">device</span> <span class="hljs-title">max</span> <span class="hljs-title">size</span>: <span class="hljs-title">0</span></span> Max <span class="hljs-command"><span class="hljs-keyword">on</span> <span class="hljs-title">device</span> <span class="hljs-title">queues</span>: <span class="hljs-title">0</span></span> Queue <span class="hljs-command"><span class="hljs-keyword">on</span> <span class="hljs-title">device</span> <span class="hljs-title">preferred</span> <span class="hljs-title">size</span>: <span class="hljs-title">0</span></span> SVM capabilities: Coarse grain buffer: No Fine grain buffer: No Fine grain <span class="hljs-keyword">system</span>: No Atomics: No Preferred <span class="hljs-built_in">platform</span> atomic alignment: <span class="hljs-number">0</span> Preferred <span class="hljs-built_in">global</span> atomic alignment: <span class="hljs-number">0</span> Preferred <span class="hljs-built_in">local</span> atomic alignment: <span class="hljs-number">0</span> Kernel Preferred work group size multiple: <span class="hljs-number">1</span> Error correction support: <span class="hljs-number">0</span> Unified memory <span class="hljs-keyword">for</span> Host <span class="hljs-operator">and</span> Device: <span class="hljs-number">1</span> Profiling timer resolution: <span class="hljs-number">1</span> Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native <span class="hljs-function"><span class="hljs-keyword">function</span>: <span class="hljs-title">Yes</span></span> Queue <span class="hljs-command"><span class="hljs-keyword">on</span> <span class="hljs-title">Host</span> <span class="hljs-title">properties</span>: </span> Out-<span class="hljs-operator">of</span>-Order: No Profiling : Yes Queue <span class="hljs-command"><span class="hljs-keyword">on</span> <span class="hljs-title">Device</span> <span class="hljs-title">properties</span>: </span> Out-<span class="hljs-operator">of</span>-Order: No Profiling : No Platform ID: <span class="hljs-number">0x7f98e6833430</span> Name: Intel(R) Core(TM) i3-<span class="hljs-number">2350</span>M CPU @ <span class="hljs-number">2.30</span>GHz Vendor: GenuineIntel Device OpenCL C <span class="hljs-built_in">version</span>: OpenCL C <span class="hljs-number">1.2</span> Driver <span class="hljs-built_in">version</span>: <span class="hljs-number">1800.11</span> (sse2,avx) Profile: FULL_PROFILE Version: OpenCL <span class="hljs-number">1.2</span> AMD-APP (<span class="hljs-number">1800.11</span>) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event </code>
另外还要在/root/.bashrc文件中添加环境变量,具体如下:
<code class="hljs bash has-numbering"><span class="hljs-comment"># AMD APP SDK</span><span class="hljs-keyword">export</span> AMDAPPSDKROOT=<span class="hljs-string">"/opt/AMDAPPSDK-3.0"</span><span class="hljs-keyword">export</span> AMDAPPSDKSAMPLESROOT=<span class="hljs-string">"/opt/AMDAPPSDK-3.0/"</span><span class="hljs-string">"export LD_LIBRARY_PATH=<span class="hljs-variable">$LD_LIBRARY_PATH</span>:"</span>/opt/AMDAPPSDK-<span class="hljs-number">3.0</span>/lib/x86_64<span class="hljs-string">":"</span>/opt/AMDAPPSDK-<span class="hljs-number">3.0</span>/lib/x86<span class="hljs-string">"export ATISTREAMSDKROOT=<span class="hljs-variable">$AMDAPPSDKROOT</span></span></code>
到这里,AMD APP SDK就算是安装好了,下面再给出我参考的几篇博文:
https://www.blackmoreops.com/2013/11/22/install-amd-app-sdk-kali-linux/
http://blog.csdn.net/vblittleboy/article/details/8979288
升级python
Ubuntu14.04自带的python版本是2.7.6的,我这里把它升级成了2.7.11的,具体方法是在终端输入下面三条命令:
<code class="hljs lasso has-numbering">sudo add<span class="hljs-attribute">-apt</span><span class="hljs-attribute">-repository</span> ppa:fkrull/deadsnakes<span class="hljs-attribute">-python2</span><span class="hljs-number">.7</span>sudo apt<span class="hljs-attribute">-get</span> update sudo apt<span class="hljs-attribute">-get</span> upgrade</code>
安装libgpuarray
为了防止安装过程出现错误影响整个python的环境,这里我们使用python的虚拟环境。
<code class="hljs lasso has-numbering">sudo apt<span class="hljs-attribute">-get</span> install python<span class="hljs-attribute">-virtualenv</span>sudo apt<span class="hljs-attribute">-get</span> install python<span class="hljs-attribute">-pip</span>virtualenv venvsource venv/bin/activate</code>
然后我们就进入了python的一个虚拟环境venv,下面的操作全是在venv中进行的。首先安装Theano和libgpuarray的一些依赖包,具体要求看libgpuarray官方文档
<code class="hljs cmake has-numbering">pip <span class="hljs-keyword">install</span> numpypip <span class="hljs-keyword">install</span> Cythonpip <span class="hljs-keyword">install</span> Scipy</code>
安装scipy时可能会报错,可参考下面链接来修复:
http://stackoverflow.com/questions/11114225/installing-scipy-and-numpy-using-pip
然后是安装Theano,注意版本号为0.8.2的稳定Theano跟libgpuarray是不同步的,在使用时会报错,具体文末会提到。这里我安装的是Theano(0.9.0dev):
<code class="hljs vala has-numbering">pip install git+https:<span class="hljs-comment">//github.com/Theano/Theano.git</span><span class="hljs-preprocessor"># 这里我使用的是robberphex的CSDN镜像,在此表示感谢</span><span class="hljs-preprocessor"># pip install git+https://code.csdn.net/u010096836/theano.git</span></code>
这里还用到了libcheck,因此装上它:
<code class="hljs bash has-numbering"><span class="hljs-built_in">sudo</span> apt-get install check</code>
下面开始安装libgpuarray
<code class="hljs bash has-numbering">git clone https://github.com/Theano/libgpuarray.git<span class="hljs-built_in">cd</span> libgpuarraymkdir Build<span class="hljs-built_in">cd</span> Buildcmake . -DCMAKE_INSTALL_PREFIX=../venv/ -DCMAKE_BUILD_TYPE=Releasemake install <span class="hljs-keyword">export</span> LIBRARY_PATH=<span class="hljs-variable">$LIBRARY_PATH</span>:<span class="hljs-variable">$PWD</span>/../venv/lib<span class="hljs-keyword">export</span> CPATH=<span class="hljs-variable">$CPATH</span>:<span class="hljs-variable">$PWD</span>/../venv/python setup.py buildpython setup.py install</code>
下面开始测试一下,Theano官方给出了一段测试程序,我们命名为test.py,程序如下:
<code class="language-python hljs has-numbering"><span class="hljs-keyword">from</span> theano <span class="hljs-keyword">import</span> function, config, shared, tensor, sandbox<span class="hljs-keyword">import</span> numpy<span class="hljs-keyword">import</span> timevlen = <span class="hljs-number">10</span> * <span class="hljs-number">30</span> * <span class="hljs-number">768</span> <span class="hljs-comment"># 10 x #cores x # threads per core</span>iters = <span class="hljs-number">1000</span>rng = numpy.random.RandomState(<span class="hljs-number">22</span>)x = shared(numpy.asarray(rng.rand(vlen), config.floatX))f = function([], tensor.exp(x))print(f.maker.fgraph.toposort())t0 = time.time()<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(iters): r = f()t1 = time.time()print(<span class="hljs-string">"Looping %d times took %f seconds"</span> % (iters, t1 - t0))print(<span class="hljs-string">"Result is %s"</span> % (r,))<span class="hljs-keyword">if</span> numpy.any([isinstance(x.op, tensor.Elemwise) <span class="hljs-keyword">and</span> (<span class="hljs-string">'Gpu'</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> type(x.op).__name__) <span class="hljs-keyword">for</span> x <span class="hljs-keyword">in</span> f.maker.fgraph.toposort()]): print(<span class="hljs-string">'Used the cpu'</span>)<span class="hljs-keyword">else</span>: print(<span class="hljs-string">'Used the gpu'</span>)</code>
先是仅用Theano和CPU,结果如下:
<code class="hljs r has-numbering">(venv)marcovaldo@marcovaldong:~/desktop$ python test.py[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]Looping <span class="hljs-number">1000</span> times took <span class="hljs-number">7.7898850441</span> secondsResult is [ <span class="hljs-number">1.23178032</span> <span class="hljs-number">1.61879341</span> <span class="hljs-number">1.52278065</span> <span class="hljs-keyword">...</span>, <span class="hljs-number">2.20771815</span> <span class="hljs-number">2.29967753</span> <span class="hljs-number">1.62323285</span>]Used the cpu</code>
再是加了THEANO_FLAGS=mode=FAST_RUN的:
<code class="hljs r has-numbering">(venv)marcovaldo@marcovaldong:~/desktop$ THEANO_FLAGS=mode=FAST_RUN,floatX=float32 python test.py[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]Looping <span class="hljs-number">1000</span> times took <span class="hljs-number">3.86811089516</span> secondsResult is [ <span class="hljs-number">1.23178029</span> <span class="hljs-number">1.61879337</span> <span class="hljs-number">1.52278066</span> <span class="hljs-keyword">...</span>, <span class="hljs-number">2.20771813</span> <span class="hljs-number">2.29967761</span> <span class="hljs-number">1.62323284</span>]Used the cpu(venv)marcovaldo@marcovaldong:~/desktop$ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python test.py[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]Looping <span class="hljs-number">1000</span> times took <span class="hljs-number">3.84727883339</span> secondsResult is [ <span class="hljs-number">1.23178029</span> <span class="hljs-number">1.61879337</span> <span class="hljs-number">1.52278066</span> <span class="hljs-keyword">...</span>, <span class="hljs-number">2.20771813</span> <span class="hljs-number">2.29967761</span> <span class="hljs-number">1.62323284</span>]Used the cpu</code>
下面使用OpenCL的时候就报错,网上没有找到有效的解决方法,希望有遇到过的大神给指点迷津,具体如下:
<code class="hljs r has-numbering">(venv)marcovaldo@marcovaldong:~/desktop$ THEANO_FLAGS=mode=FAST_RUN,device=opencl0:<span class="hljs-number">0</span>,floatX=float32 python test.pyERROR (theano.sandbox.gpuarray): Could not initialize pygpu, support disabledTraceback (most recent call last): File <span class="hljs-string">"/home/marcovaldo/myvenv/venv/local/lib/python2.7/site-packages/theano/sandbox/gpuarray/__init__.py"</span>, line <span class="hljs-number">96</span>, <span class="hljs-keyword">in</span> <module> init_dev(config.device) File <span class="hljs-string">"/home/marcovaldo/myvenv/venv/local/lib/python2.7/site-packages/theano/sandbox/gpuarray/__init__.py"</span>, line <span class="hljs-number">47</span>, <span class="hljs-keyword">in</span> init_dev <span class="hljs-string">"Make sure Theano and libgpuarray/pygpu "</span>RuntimeError: (<span class="hljs-string">'Wrong major API version for gpuarray:'</span>, -<span class="hljs-number">9997</span>, <span class="hljs-string">'Make sure Theano and libgpuarray/pygpu are in sync.'</span>)[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]Looping <span class="hljs-number">1000</span> times took <span class="hljs-number">3.86138486862</span> secondsResult is [ <span class="hljs-number">1.23178029</span> <span class="hljs-number">1.61879337</span> <span class="hljs-number">1.52278066</span> <span class="hljs-keyword">...</span>, <span class="hljs-number">2.20771813</span> <span class="hljs-number">2.29967761</span> <span class="hljs-number">1.62323284</span>]Used the cpu</code>
到这里,如果你没有下面的这个问题,你的libgpuarray应该就算装好了。
<code class="hljs vhdl has-numbering">RuntimeError: (<span class="hljs-attribute">'Wrong</span> major API version <span class="hljs-keyword">for</span> gpuarray:', -<span class="hljs-number">9997</span>, <span class="hljs-attribute">'Make</span> sure Theano <span class="hljs-keyword">and</span> libgpuarray/pygpu are <span class="hljs-keyword">in</span> sync.')RuntimeError: (<span class="hljs-attribute">'Wrong</span> major API version <span class="hljs-keyword">for</span> gpuarray:', -<span class="hljs-number">9998</span>, <span class="hljs-attribute">'Make</span> sure Theano <span class="hljs-keyword">and</span> libgpuarray/pygpu are <span class="hljs-keyword">in</span> sync.')</code>
接下来我会抽时间翻译一下libgpuarray的官方安装文档,供后来的同学参考。
现在的深度计算工具都是官方支持N卡,A卡在这方面实在太吃亏了,希望各个深度学习工具能尽快做出支持A卡的API。
最后鸣谢robberphex和Tinyfool,二位的博客我提供了思路。
参考链接
- http://deeplearning.net/software/libgpuarray/installation.html
- https://www.robberphex.com/2016/05/521
- http://codechina.org/2016/04/how-to-install-theano-on-mac-os-x-ei-caption-with-opencl-support/
- http://m.blog.csdn.net/article/details?id=43987599
- http://forum.ubuntu.org.cn/viewtopic.php?t=445434
- http://www.tuicool.com/articles/6N3e2ir
- https://www.blackmoreops.com/2013/11/22/install-amd-app-sdk-kali-linux/
- http://blog.csdn.net/vblittleboy/article/details/8979288
- http://blog.csdn.net/zahuopuboss/article/details/50927432
- http://stackoverflow.com/questions/27971707/using-pythontheano-with-opencl-in-an-amd-gpu
- http://stackoverflow.com/questions/11114225/installing-scipy-and-numpy-using-pip
- Ubuntu14.04+Theano+OpenCL+libgpuarray实现GPU运算
- Ubuntu14.04+Theano+OpenCL+libgpuarray实现GPU运算
- ubuntu14.04安装theano配置GPU环境
- UBUNTU14.04 下 安装Intel GPU OpenCL runtime
- ubuntu14.04安装Theano
- Ubuntu14.04安装theano
- Theano、Lasagne、TensorFlow在Ubuntu14.04 64支持GPU的安装 py27
- ubuntu16.04+keras+theano+GPU
- 在ubuntu14.10上安装theano并且使用GPU加速
- ubuntu14.04安装intel openCL
- ubuntu14.04安装intel openCL
- ubuntu14.04+GPU+caffe
- Ubuntu14.04+CUDA7.5+theano
- ubuntu14.04安装CUDA+theano
- theano GPU
- Ubuntu16.04lts 安装Theano配置GPU
- ubuntu14.04安装theano的二进制网络theano-xnor-net
- ubuntu14.04 amd显卡 OpenCL caffe安装
- java IO流File常用的方法
- ViewPager切换动画PageTransformer的使用
- android解决方法数超过65536问题
- Data Binding Library 开元框架简单使用用
- error: command 'C:\\Users\\Mrchang\\AppData\\Local\\Programs\\Common\\Microsoft\\Visual C++ for Pyt
- Ubuntu14.04+Theano+OpenCL+libgpuarray实现GPU运算
- SpringMVC进行json数据交互
- Spring容器整合WebSocket
- jQuery笔记
- 移除源字符串中的目标串
- Android WebView
- TCP/IP 各类协议及所在层
- The Python Challenge Level-8 Solution
- mpeg dash简单介绍