Ubuntu16.04 配置tensorflow1.0 gpu版本

来源:互联网 发布:access数据库表的关联 编辑:程序博客网 时间:2024/06/06 00:09

requirements

  • python 2.7
  • Flask
  • tensorflow GPU 版本

安装nvidia driver

经过不断踩坑的安装,终于google到了靠谱的方法,首先检查你的NVIDIA VGA card model

sudo lshw -numeric -C display
  • 1
  • 2
  • 1
  • 2

NVIDIA-DISPLAYCARD 
可以看到你的显卡信息,比如我的就是 product: GM107M [GeForce GTX 950M] [10DE:139A],然后去NVDIA driver search page搜索你的显卡需要的驱动型号,页面如下: 
gtx-search

下面是我的电脑对应的驱动版本

LINUX X64 (AMD64/EM64T) DISPLAY DRIVERVersion:    375.20Release Date:   2016.11.18Operating System:   Linux 64-bitLanguage:   English (US)File Size:  72.37 MB
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

从搜索的结果页面看到,我的驱动版本应该是375.20,为了再次确认一遍,你还可以使用这个命令查看你可以使用的驱动:

ubuntu-drivers devices
  • 1
  • 1

结果显示和搜索到的驱动版本一样,推荐也是375

== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==vendor   : NVIDIA Corporationmodel    : GM107M [GeForce GTX 950M]modalias : pci:v000010DEd0000139Asv000017AAsd0000380Bbc03sc02i00driver   : nvidia-367 - third-party freedriver   : nvidia-375 - third-party free recommendeddriver   : nvidia-364 - third-party freedriver   : nvidia-358 - third-party freedriver   : xserver-xorg-video-nouveau - distro free builtindriver   : nvidia-370 - third-party free== cpu-microcode.py ==driver   : intel-microcode - distro non-free
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

好了,终于可以安装对应的驱动了,使用以下命令

version: 375sudo apt-get install nvidia-375//你自己的版本//version : xxx//sudo apt-get install nvidia-xxx
  • 1
  • 2
  • 3
  • 4
  • 5
  • 1
  • 2
  • 3
  • 4
  • 5

什么,安装很慢,找不到包?更换一下软件源,这个自己google怎么更换,最简单的就是图形界面里面找到System->settings->Software&Updates,然后换一下源,比如阿里云或者中科大(我突然不能链接中科大镜像了,真实坑),然后再执行一下命令

sudo apt-get install mesa-common-devsudo apt-get install freeglut3-dev
  • 1
  • 2
  • 1
  • 2

安装完成之后,重启电脑,驱动应该就完成了!你可以在dashboard上搜索nvidia,看到像 NVIDIA X Server Settings的东西,就说明安装驱动成功了,接下来就是安装cuda8了 
NVIDIA-DashBoard 
NVIDIA X Server Settings

安装cuda8

首先也是去下载cuda toolkit 8.0,可以自己注册一个账号。 
CUDA8 
一定要选择runfile.下载完成之后,执行

sudo sh cuda_8.0.44_linux.run --override
  • 1
  • 1

然后就进入安装过程,开始都是End User License Agreement,你可以CTRL +C 跳过,然后accept,下面就是安装的交互界面,开始的Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.48?选择n,因为你已经安装驱动了。

Using more to view the EULA.End User License Agreement--------------------------Preface-------The following contains specific license terms and conditionsfor four separate NVIDIA products. By accepting thisagreement, you agree to comply with all the terms andconditions applicable to the specific product(s) includedherein.NVIDIA CUDA ToolkitDescriptionThe NVIDIA CUDA Toolkit provides command-line and graphicaltools for building, debugging and optimizing the performanceof applications accelerated by NVIDIA GPUs, runtime and mathlibraries, and documentation including programming guides,user manuals, and API references. The NVIDIA CUDA ToolkitLicense Agreement is available in Chapter 1.Default Install Location of CUDA ToolkitWindows platform:Do you accept the previously read EULA?accept/decline/quit: acceptInstall NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.48?(y)es/(n)o/(q)uit: nInstall the CUDA 8.0 Toolkit?(y)es/(n)o/(q)uit: yEnter Toolkit Location [ default is /usr/local/cuda-8.0 ]:  Do you want to install a symbolic link at /usr/local/cuda?(y)es/(n)o/(q)uit: yInstall the CUDA 8.0 Samples?(y)es/(n)o/(q)uit: y Enter CUDA Samples Location [ default is /home/kinny ]: Installing the CUDA Toolkit in /usr/local/cuda-8.0 ...Missing recommended library: libXmu.soInstalling the CUDA Samples in /home/kinny ...Copying samples to /home/kinny/NVIDIA_CUDA-8.0_Samples now...Finished copying samples.============ Summary ============Driver:   Not SelectedToolkit:  Installed in /usr/local/cuda-8.0Samples:  Installed in /home/kinny, but missing recommended librariesPlease make sure that -   PATH includes /usr/local/cuda-8.0/bin -   LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as rootTo uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/binPlease see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:    sudo <CudaInstaller>.run -silent -driverLogfile is /tmp/cuda_install_17494.log
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81

配置cuda环境变量

export PATH="$PATH:/usr/local/cuda-8.0/bin"export LD_LIBRARY_PATH="/usr/local/cuda-8.0/lib64"nvidia-smi
  • 1
  • 2
  • 3
  • 4
  • 1
  • 2
  • 3
  • 4

结果出现以下输出,说明配置成功 
nvidia-smi

安装深度学习库cuDNN

首先下载cuDNN5.1,直接下载是非常慢的,必须走代理,我用的是终端下载的方法,注意前提是你已经注册为开发者了!

proxychains wget https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v5.1/prod/8.0/cudnn-8.0-linux-x64-v5.1-tgz这个会被forbidden,因为没有认证,开发者需要认证才能下载,你先用chrome下载,然后到show all里面去copy真实的下载地址proxychains wget http://developer.download.nvidia.com/compute/machine-learning/cudnn/secure/v5.1/prod/8.0/cudnn-8.0-linux-x64-v5.1.tgz?autho=1479703345_7fbb517b03361780b45a2c43277bb9ac&file=cudnn-8.0-linux-x64-v5.1.tgz这次成功了!!速度还可以!不过下载下来的文件名字有问题,修改成cudnn-8.0-linux-x64-v5.1.tgz就可以了然后是解压tar xvzf cudnn-8.0-linux-x64-v5.1.tgz然后将库和头文件copy到cuda目录(一定是你自己安装的目录如/usr/local/cuda-8.0),不过正确安装的话,ubuntu一般就会有软链接/usr/local/cuda -> /usr/local/cuda-8.0/sudo cp cuda/include/cudnn.h /usr/local/cuda/includesudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

安装tensorflow gpu enable python 2.7 版本,详见官网

export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.11.0-cp27-none-linux_x86_64.whlsudo pip install --upgrade $TF_BINARY_URL验证$python Python 2.7.12 (default, Jul  1 2016, 15:12:24) [GCC 5.4.0 20160609] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import tensorflowI tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locallyI tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locallyI tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locallyI tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locallyI tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally>>> quit()大功告成!
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

错误

1.libcudart.so.8.0: cannot open shared object file: No such file or directory

kinny@kinny-Lenovo-XiaoXin:~/Study/tensorflow-0.11.0rc0/tensorflow/models/image/mnist$ python convolutional.py Traceback (most recent call last):  File "convolutional.py", line 34, in <module>    import tensorflow as tf  File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 23, in <module>    from tensorflow.python import *  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 49, in <module>    from tensorflow.python import pywrap_tensorflow  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>    _pywrap_tensorflow = swig_import_helper()  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper    _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

方法是设置环境变量,把以前设置的cuda环境变量改成一下这样,这个是tensorflow官网上要求的环境变量;

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"export CUDA_HOME=/usr/local/cuda
  • 1
  • 2
  • 1
  • 2

2.TypeError: run() got an unexpected keyword argument ‘argv’

Traceback (most recent call last):  File "convolutional.py", line 339, in <module>    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)TypeError: run() got an unexpected keyword argument 'argv'
  • 1
  • 2
  • 3
  • 4
  • 1
  • 2
  • 3
  • 4

方法是把main里面的argv参数去掉

使用python 虚拟环境

使用gpu版本运行mnist例子非常慢,基本卡死在数据下载和读取上了!为了比较gpu和cpu的性能,使用虚拟环境安装了tensorflow的cpu版本;

sudo apt-get install python-pip python-dev python-virtualenvmkdir py2virtualenvvirtualenv --system-site-packages ~/py2virtualenv/tensorflowcpusource ~/py2virtualenv/tensorflowcpu/bin/activateexport TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.11.0-cp27-none-linux_x86_64.whlpip install --upgrade $TF_BINARY_URL
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

原来cpu版本数据读取和下载很快!cpu适合做IO和简单逻辑运算和加减,但是gpu不行,gpu不适合做高IO和加减法,但是在做矩阵运算表现十分强悍,我在把mnist数据集下载到本地后,分别使用cpu版本和gpu版本跑tensorflow/tensorflow/models/image/mnist/convolutional.py,结果显示:

//cpu版本Step 8100 (epoch 9.43), 130.6 msMinibatch loss: 1.630, learning rate: 0.006302Minibatch error: 0.0%Validation error: 0.8%平均每 100130.64ms 左右real  19m5.685suser  67m33.720ssys 0m12.340s//gpu版本Step 8100 (epoch 9.43), 23.2 msMinibatch loss: 1.634, learning rate: 0.006302Minibatch error: 0.0%Validation error: 0.9%平均每 10023.2ms 左右real  3m28.296suser  2m45.888ssys 0m29.064s
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

GPU在矩阵密集运算方面完虐cpu,大概是6倍。我的是GTX 950M,不知道现在的GTX 1080M是什么情况。

升级到r0.12版本

为了不再影响Ubuntu 自带Python 2.7的主要环境,我直接卸载了tensorflow。采用sudo pip uninstall tensorflow即可。然后创造了虚拟环境tensorflowgpu。

virtualenv --system-site-packages ~/py2virtualenv/tensorflowgpusource ~/py2virtualenv/tensorflowgpu/bin/activateexport TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.0rc1-cp27-none-linux_x86_64.whlpip2 install --upgrade $TF_BINARY_URL~/py2virtualenv/tensorflowgpu/bin$ ./pip2 show tensorflowName: tensorflowVersion: 0.12.0rc1Summary: TensorFlow helps the tensors flowHome-page: http://tensorflow.org/Author: Google Inc.Author-email: opensource@google.comLicense: Apache 2.0Location: /usr/local/lib/python2.7/dist-packagesRequires: mock, numpy, protobuf, wheel, six
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

升级以前的cpu0.11到0.12

source ~/py2virtualenv/tensorflowcpu/bin/activateexport TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.0rc1-cp27-none-linux_x86_64.whlpip install --upgrade $TF_BINARY_URL(tensorflowcpu) kinny@kinny-Lenovo-XiaoXin:~/Study/ocr$ pip show tensorflowName: tensorflowVersion: 0.12.0rc1Summary: TensorFlow helps the tensors flowHome-page: http://tensorflow.org/Author: Google Inc.Author-email: opensource@google.comLicense: Apache 2.0Location: /home/kinny/py2virtualenv/tensorflowcpu/lib/python2.7/site-packagesRequires: mock, numpy, protobuf, wheel, six
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

低配机器安装tensorflow出现MemoryError

原因是pip 安装默认会进行缓存,全部读到内存中,过大的包就会出现该错误

  • 操作系统 Ubuntu Server 16.04.1 LTS 64位
  • CPU 1核
  • 内存 1GB
  • 系统盘 20GB(云硬盘) xxxxxx
  • 公网带宽 1Mbps
sudo pip  --no-cache-dir install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.0rc0-cp27-none-linux_x86_64.whl
  • 1
  • 1

以下两者的区别,在翻墙的时候表现的就尤其明显,前者会非常慢,这里就是因为 后者 是package 安装形式,主要是从Ubuntu的软件源找包并安装,国内都会有相应的镜像,速度不慢,但是前者就会直接到Python的package站点下载,非常慢,基本下不动

sudo pip install numpysudo apt-get install python-numpy
  • 1
  • 2
  • 1
  • 2

基本上安装Python包都可以使用 sudo apt-get install python-XXX 或者直接 sudo pip install XXX,如果前者找不到,看看提示,是不是名字不一样,基本大多数包都支持 Ubuntu package

参考

  1. how-to-install-the-latest-nvidia-drivers-on-ubuntu-16-04-xenial-xerus
  2. 深度学习主机环境配置-ubuntu-16-04-nvidia-gtx-1080-cuda-8
  3. install-gpu-tensorflow-from-sources-w-ubuntu-16-04-and-cuda-8-0-rc
  4. Ubuntu16.04+CUDA8.0+caffe配置
  5. fully_connected_preloaded.py on GPU trains slower then on CPU #838

0 0