Archlinux(generic) Linux 下安装安装配置tensorflow_gpu_1.2.0

来源:互联网 发布:淘宝刷到一天能赚多少 编辑:程序博客网 时间:2024/06/01 23:19

单位搞了台ArchLinux服务器插了一块英伟达1080ti做Cuda开发, 看得比较心痒. 手上有一些机器学习的工作和学习内容打算用GPU感受一下跑车速度.
因为之前写过CUDA. 开发C代码的苦楚历历在目, 而且作为机器学习这种高度灵活和贴近业务的工作也没必要花时间再发明轮子. 考虑到google的强大背景,所以决定上tensorflow和keras学习学习.

关于显卡驱动和cuda环境的安装, 本文不过多涉及,下面是两个链接,一个是驱动, 一个是CUDA:
http://www.nvidia.cn/download/driverResults.aspx/117766/cn
https://developer.nvidia.com/cuda-downloads
只要不是上古内核, 驱动安装就是傻瓜化的,不多说。
CUDA有一点麻烦, 官方只提供了几个大的发行版的安装包。 小众一些的例如Arch就没有, 但是发行版自己维护的有CUDA的包, 使用 pacman 直接安装cuda就可以了。

下面进入正题:

Linux环境上, google只针对ubuntu发布了官方的deb安装包. 并且在install guide里说了这么一句话: “don’t build a TensorFlow binary yourself unless you are very comfortable building complex packages from source and dealing with the inevitable aftermath should things not go exactly as documented.

得,心凉. 想在其他的Linux上跑tensorflow还是比想象中麻烦一些的.

除了自己编译之外, 还有两种可行的办法在非Ubuntu发行版上使用Tensorflow.
1. Docker
首先测试使用docker部署.
也不知道是因为科学上网失效还是官方的源有问题, 一直下不到image.
并且nvidia-docker很不稳定, 自带的测试的测试还偶尔报:” nvidia-docker-plugin exits with “Error: nvml: Unknown Error” ” 我们时间宝贵, 不再折腾, 等等后续版本稳定了再说.
如果看官有兴趣, 这里是使用docker部署的instructions:
https://www.tensorflow.org/install/install_linux#InstallingDocker

2.Anaconda
使用anaconda有个好处, 就是省时间。
它可以帮助自动安装大部分科学计算和数据处理依赖的库, 同时自己搞定环境变量等等的配置。
a. 先安装anaconda。 下载linux版本的anaconda安装程序,地址在这里:https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh 。

b. 执行 bash Anaconda3-4.4.0-Linux-x86_64.sh 安装。 这里不推荐安装到root,因为可能会和系统里的python版本有冲突。使用普通用户执行安装, 安装到home下比较好。

c. 创建一个virtual env 专门用来跑tensorflow。

$ conda create -n tensorflow$ source activate tensorflow $ conda install anaconda$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp36-cp36m-linux_x86_64.whl

注意这里,因为conda 新创建的virtual env默认安装了py3.6版本, 所以这里也要对应安装相应的tensorflow版本。 如果是2.x或者是3.5, 需要按照如下表格选择对应的版本:

The URL of the TensorFlow Python package

A few installation mechanisms require the URL of the TensorFlow Python
package. The value you specify depends on three factors:

operating systemPython versionCPU only vs. GPU support

This section documents the relevant values for Linux installations.
Python 2.7

CPU only:

https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.2.0-cp27-none-linux_x86_64.whl

GPU support:

https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp27-none-linux_x86_64.whl

Note that GPU support requires the NVIDIA hardware and software
described in NVIDIA requirements to run TensorFlow with GPU support.
Python 3.4

CPU only:

https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.2.0-cp34-cp34m-linux_x86_64.whl

GPU support:

https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp34-cp34m-linux_x86_64.whl

Note that GPU support requires the NVIDIA hardware and software
described in NVIDIA requirements to run TensorFlow with GPU support.
Python 3.5

CPU only:

https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.2.0-cp35-cp35m-linux_x86_64.whl

GPU support:

https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp35-cp35m-linux_x86_64.whl

Note that GPU support requires the NVIDIA hardware and software
described in NVIDIA requirements to run TensorFlow with GPU support.
Python 3.6

CPU only:

https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.2.0-cp36-cp36m-linux_x86_64.whl

GPU support:

https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.0-cp36-cp36m-linux_x86_64.whl

Note that GPU support requires the NVIDIA hardware and software
described in NVIDIA requirements to run TensorFlow with GPU support.

d. 安装cudnn. 地址在这里 https://developer.nvidia.com/cudnn , 千万注意, 我们刚才安装的tensorflow 1.2 使用的是cudnn5 版本。 不要下错了。

$ tar zxvf cudnn-8.0-linux-x64-v5.1.tgz$ sudo cp -r CUDA /opt/$ cd /opt/cuda/lib64$ ls *libcudnn*$ sudo ldconfig

解释一下, 上面第一条shell命令解压下载来的cudnn。 解压后会在当前目录生成包含cudnn的“CUDA”目录,不要迷糊,这个“CUDA”目录名字只是为了方便让你把他merge到CUDA的安装目录。将这个目录整体拷贝到CUDA的安装目录即可。(在笔者的服务器上, CUDA安装在 /opt/下面)

e. 验证tensorflow安装是否成功:

$ source activate tensorflow $ python>>> import tensorflow as tf

如果没有错误提示, 说明安装成功。

f. 安装keras : $ pip install keras
到这里, tensorflow和keras的安装完成。

最后, 拿 keras sample里的 helloword(mnist_cnn.py)做一下benchmark:

CPU计算:

mnist_cnn.py', wdir='E:/lzjwork/develop/keras-master/examples')x_train shape: (60000, 28, 28, 1)60000 train samples10000 test samplesTrain on 60000 samples, validate on 10000 samplesEpoch 1/1260000/60000 [==============================] - 132s - loss: 0.3531 - acc: 0.8916 - val_loss: 0.0888 - val_acc: 0.9726Epoch 2/1260000/60000 [==============================] - 131s - loss: 0.1191 - acc: 0.9653 - val_loss: 0.0540 - val_acc: 0.9827Epoch 3/1260000/60000 [==============================] - 133s - loss: 0.0878 - acc: 0.9743 - val_loss: 0.0433 - val_acc: 0.9863Epoch 4/1260000/60000 [==============================] - 129s - loss: 0.0731 - acc: 0.9783 - val_loss: 0.0387 - val_acc: 0.9866Epoch 5/1260000/60000 [==============================] - 124s - loss: 0.0639 - acc: 0.9806 - val_loss: 0.0358 - val_acc: 0.9879Epoch 6/1260000/60000 [==============================] - 124s - loss: 0.0550 - acc: 0.9842 - val_loss: 0.0342 - val_acc: 0.9883Epoch 7/1260000/60000 [==============================] - 123s - loss: 0.0529 - acc: 0.9845 - val_loss: 0.0307 - val_acc: 0.9895Epoch 8/1260000/60000 [==============================] - 123s - loss: 0.0484 - acc: 0.9855 - val_loss: 0.0303 - val_acc: 0.9893Epoch 9/1260000/60000 [==============================] - 124s - loss: 0.0432 - acc: 0.9872 - val_loss: 0.0296 - val_acc: 0.9902Epoch 10/1260000/60000 [==============================] - 124s - loss: 0.0424 - acc: 0.9871 - val_loss: 0.0270 - val_acc: 0.9911Epoch 11/1260000/60000 [==============================] - 122s - loss: 0.0399 - acc: 0.9874 - val_loss: 0.0276 - val_acc: 0.9909Epoch 12/1260000/60000 [==============================] - 123s - loss: 0.0362 - acc: 0.9892 - val_loss: 0.0264 - val_acc: 0.9913Test loss: 0.0264107687845Test accuracy: 0.9913

GPU计算

2017-06-21 17:44:40.639353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: name: Graphics Devicemajor: 6 minor: 1 memoryClockRate (GHz) 1.582pciBusID 0000:01:00.0Total memory: 10.91GiBFree memory: 10.74GiB2017-06-21 17:44:40.639367: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 2017-06-21 17:44:40.639371: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 2017-06-21 17:44:40.639379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Graphics Device, pci bus id: 0000:01:00.0)60000/60000 [==============================] - 4s - loss: 0.3362 - acc: 0.8979 - val_loss: 0.0779 - val_acc: 0.9756Epoch 2/1260000/60000 [==============================] - 3s - loss: 0.1136 - acc: 0.9670 - val_loss: 0.0504 - val_acc: 0.9833Epoch 3/1260000/60000 [==============================] - 3s - loss: 0.0869 - acc: 0.9745 - val_loss: 0.0437 - val_acc: 0.9858Epoch 4/1260000/60000 [==============================] - 3s - loss: 0.0729 - acc: 0.9786 - val_loss: 0.0369 - val_acc: 0.9874Epoch 5/1260000/60000 [==============================] - 3s - loss: 0.0630 - acc: 0.9807 - val_loss: 0.0360 - val_acc: 0.9872Epoch 6/1260000/60000 [==============================] - 3s - loss: 0.0567 - acc: 0.9831 - val_loss: 0.0337 - val_acc: 0.9884Epoch 7/1260000/60000 [==============================] - 3s - loss: 0.0508 - acc: 0.9851 - val_loss: 0.0293 - val_acc: 0.9901Epoch 8/1260000/60000 [==============================] - 3s - loss: 0.0475 - acc: 0.9859 - val_loss: 0.0326 - val_acc: 0.9879Epoch 9/1260000/60000 [==============================] - 3s - loss: 0.0451 - acc: 0.9867 - val_loss: 0.0315 - val_acc: 0.9896Epoch 10/1260000/60000 [==============================] - 3s - loss: 0.0420 - acc: 0.9872 - val_loss: 0.0283 - val_acc: 0.9905Epoch 11/1260000/60000 [==============================] - 3s - loss: 0.0394 - acc: 0.9885 - val_loss: 0.0300 - val_acc: 0.9906Epoch 12/1260000/60000 [==============================] - 3s - loss: 0.0377 - acc: 0.9884 - val_loss: 0.0289 - val_acc: 0.9905Test loss: 0.0288597793579Test accuracy: 0.9905

真是有一种牛车换火箭的感觉。

原创粉丝点击