joy of data

来源：互联网发布：linux内核入门书籍编辑：程序博客网时间：2024/05/21 08:52

http://www.joyofdata.de/blog/

很好的一个网站，关于数据分析等等

In this tutorial I am going to show you how to set upCUDA 7,cuDNN,caffe andDIGITS on ag2.2xlarge EC2 instance (running Ubuntu 14.04 64 bit) and how to get started with DIGITS. For illustrating DIGITS’ application I use a currentKaggle competition about detecting diabetic retinopathy and its state fromfluorescein angiography.

Convolutional Deep Neural Networks for Image Classification

For classification or regression on images you have two choices:

Feature engineering and upon that translating an image into a vector
Relying on a convolutional DNN to figure out the features

Deep Neural Networks are computationally quite demanding. This is the case for two reasons:

The input data is much larger if you use even a small image resolution of 256 x 256 RGB-pixel implies 196’608 input neurons (256 x 256 x 3). If you engineer your features intelligently then a 1000 neurons would be a lot already.
Saddling the network with the burden of figuring out the relevant features also requires a more sophisticated network structure and more layers.

Luckily many of the involved floating point matrix operations have been unintentionally addressed by your graphic card’s GPU.

NVIDIA DIGITS and caffe

There are three major GPU utilizing Deep Learning frameworks available – Theano, Torch and caffe. NVIDIA DIGITS is a web server providing a convenient web interface for training and testing Deep Neural Networks based on caffe. I intend to cover in a future article how to work with caffe. Here I will show you how to set up CUDA

First of all you need an AWS account and g2.2xlarge instance up and running. That is mostly self-explanatory – for the command line parts (and some tips) you might want to have a look at my previous tutorial “Guide to EC2 from the Command Line“. Make sure to add an inbound rule for port 5000 for your IP – b/c this is where the DIGITS server is made available at.

# don't forget to get your system up to datesudo apt-get updatesudo apt-get dist-upgrade

# don't forget to get your system up to date

sudo apt-get update

sudo apt-get dist-upgrade

Installing CUDA 7

Main source for this step is Markus Beissinger’s blog post on setting up Theano.

# installation of required toolssudo apt-get install -y gcc g++ gfortran build-essential \  git wget linux-image-generic libopenblas-dev python-dev \  python-pip python-nose python-numpy python-scipy# downloading the (currently) most recent version of CUDA 7sudo wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.0-28_amd64.deb# installing CUDAsudo dpkg -i cuda-repo-ubuntu1404_7.0-28_amd64.debsudo apt-get updatesudo apt-get install cuda# setting the environment variables so CUDA will be foundecho -e "\nexport PATH=/usr/local/cuda/bin:$PATH" >> .bashrcecho -e "\nexport LD_LIBRARY_PATH=/usr/local/cuda/lib64" >> .bashrcsudo reboot# installing the samples and checking the GPUcuda-install-samples-7.0.sh ~/cd NVIDIA\_CUDA-7.0\_Samples/1\_Utilities/deviceQuery  make  ./deviceQuery

# installation of required tools

sudo apt-get install -y gcc g++ gfortran build-essential \

git wget linux-image-generic libopenblas-dev python-dev \

python-pip python-nose python-numpy python-scipy

# downloading the (currently) most recent version of CUDA 7

sudo wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.0-28_amd64.deb

# installing CUDA

sudo dpkg -i cuda-repo-ubuntu1404_7.0-28_amd64.deb

sudo apt-get update

sudo apt-get install cuda

# setting the environment variables so CUDA will be found

echo -e "\nexport PATH=/usr/local/cuda/bin:$PATH" >> .bashrc

echo -e "\nexport LD_LIBRARY_PATH=/usr/local/cuda/lib64" >> .bashrc

sudo reboot

# installing the samples and checking the GPU

cuda-install-samples-7.0.sh ~/

cd NVIDIA\_CUDA-7.0\_Samples/1\_Utilities/deviceQuery

make

./deviceQuery

Installing cuDNN

To further speed up deep learning relevant calculations it is a good idea to set up the cuDNN library. For that purpose you will have to get an NVIDIA developer account and join the CUDA registered developer program. The last step requires NVIDIA to unlock your account and that might take one or two days. But you can get started also without cuDNN library. As soon as you have the okay from them – download cuDNN and upload it to your instance.

# unpack the librarygzip -d cudnn-6.5-linux-x64-v2.tar.gztar xf cudnn-6.5-linux-x64-v2.tar# copy the library files into CUDA's include and lib folderssudo cp cudnn-6.5-linux-x64-v2/cudnn.h /usr/local/cuda-7.0/includesudo cp cudnn-6.5-linux-x64-v2/libcudnn* /usr/local/cuda-7.0/lib64

# unpack the library

gzip -d cudnn-6.5-linux-x64-v2.tar.gz

tar xf cudnn-6.5-linux-x64-v2.tar

# copy the library files into CUDA's include and lib folders

sudo cp cudnn-6.5-linux-x64-v2/cudnn.h /usr/local/cuda-7.0/include

sudo cp cudnn-6.5-linux-x64-v2/libcudnn* /usr/local/cuda-7.0/lib64

Installing caffe

Main source for this and the following step is the readme of the DIGITS project.

sudo apt-get install libprotobuf-dev libleveldb-dev \  libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev \  libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler \  libatlas-base-dev# the version number of the required branch might change# consult https://github.com/NVIDIA/DIGITS/blob/master/README.mdgit clone --branch v0.11.0 https://github.com/NVIDIA/caffe.gitcd ~/caffe/pythonfor req in $(cat requirements.txt); do sudo pip install $req; donecd ~/caffecp Makefile.config.example Makefile.config# check that USE_CUDNN is set to 1 in case you would# like to use it and to 0 if notmake allmake pymake testmake runtestecho -e "\nexport CAFFE_HOME=/home/ubuntu/caffe" >> ~/.bashrc# load the new environmental variablesbash

sudo apt-get install libprotobuf-dev libleveldb-dev \

libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev \

libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler \

libatlas-base-dev

# the version number of the required branch might change

# consult https://github.com/NVIDIA/DIGITS/blob/master/README.md

git clone --branch v0.11.0 https://github.com/NVIDIA/caffe.git

cd ~/caffe/python

for req in $(cat requirements.txt); do sudo pip install $req; done

cd ~/caffe

cp Makefile.config.example Makefile.config

# check that USE_CUDNN is set to 1 in case you would

# like to use it and to 0 if not

make all

make py

make test

make runtest

echo -e "\nexport CAFFE_HOME=/home/ubuntu/caffe" >> ~/.bashrc

# load the new environmental variables

bash

Installing DIGITS

cd ~git clone https://github.com/NVIDIA/DIGITS.git digitscd digitssudo apt-get install graphviz gunicornfor req in $(cat requirements.txt); do sudo pip install $req; done

cd ~

git clone https://github.com/NVIDIA/DIGITS.git digits

cd digits

sudo apt-get install graphviz gunicorn

for req in $(cat requirements.txt); do sudo pip install $req; done

Starting and Configuring DIGITS

The first time you start DIGITS it will ask you number of questions for the purpose of its configuration. But those settings are pretty much self-explanatory and you can change them afterwards in ~/.digits/digits.cfg . You might want to consider locating your job-directory ( jobs_dir) on an EBS – the data set of about 140’000 PNGs in the example I feature here consumes about 10 GB of space and the trained models (with all its model snapshots) accounts for about 1 GB.

# change into your digits directorycd digits# start the server./digits-devserver

# change into your digits directory

cd digits

# start the server

./digits-devserver

Troubleshooting DIGITS

When you start DIGITS for the first time you might run into a number of errors and warnings. Here’s my take on them.

"libdc1394 error: Failed to initialize libdc1394"# no big deal - either ignore or treat symptomaticallysudo ln /dev/null /dev/raw1394

"libdc1394 error: Failed to initialize libdc1394"

# no big deal - either ignore or treat symptomatically

sudo ln /dev/null /dev/raw1394

"Gtk-WARNING **: Locale not supported by C library."# not sure how serious this is - but it is easy to resolvesudo apt-get install language-pack-en-basesudo dpkg-reconfigure locales# check what locales are available and then ...locale -a# ... set LC_ALL to itecho -e "\nexport LC_ALL=\"en_US.utf8\"" >> ~/.bashrc

"Gtk-WARNING **: Locale not supported by C library."

# not sure how serious this is - but it is easy to resolve

sudo apt-get install language-pack-en-base

sudo dpkg-reconfigure locales

# check what locales are available and then ...

locale -a

# ... set LC_ALL to it

echo -e "\nexport LC_ALL=\"en_US.utf8\"" >> ~/.bashrc

"Gdk-CRITICAL **: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed"# this is a big deal and will cause the server start up to fail:# connect with ssh flags -Xissh -Xi ...

"Gdk-CRITICAL **: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed"

# this is a big deal and will cause the server start up to fail:

# connect with ssh flags -Xi

ssh -Xi ...

"Couldn't import dot_parser, loading of dot files will not be possible."# reinstall pyparsing:sudo pip uninstall pyparsingsudo pip install pyparsing==1.5.7sudo pip install pydot

"Couldn't import dot_parser, loading of dot files will not be possible."

# reinstall pyparsing:

sudo pip uninstall pyparsing

sudo pip install pyparsing==1.5.7

sudo pip install pydot

Getting Started with DIGITS

digits_new_dataset First you have to create the data set on which you want to train a model. You have to provide at least one large set of pictures for the training and optionally two smaller sets for validation and testing. You can either separate those sets (and their correct labels) by means of different folders or – what I’d recommend – by providing corresponding CSVs. Those CSVs are supposed to feature two unnamed tab separated columns. The first column keeps the full path of the image (don’t use ~ for home, but the its path equivalent) and the second column keeps a 0-based index referencing the correct class. You will also have to provide a text file holding the different classes – one per line. digits_new_model For example if you have two classes “pos” (1st line) and “neg” (2nd line) – then an image belonging to class “pos” would have to have a class index of 0 associated with it. Loading might take a while. Loading my 140’000 PNGs with 256×256 resolution took about one hour.

Setting up the model you intend to train is even easier provided you stick with the suggested defaults – just choose the data set you want to use, a network and you’re ready to go! Training a GoogLeNet for 30 epochs on the described data set took about one day and 6 hours. This is why you should make sure that …

… your bidding for a Spot instance is not too low – or you risk it being terminated
… you start the server in tmux session. Otherwise if you lose connection – maybe b/c your IP changes over night – the server process will be killed

Tackling the Diabetic Retinopathy Kaggle challenge

The provided training set consists of about 35 thousand images of high resolution – zipped and split accross five files. The whole zip archive is about 33 GB large. I downloaded the five components directly onto an EBS using lynx – b/c you can just regularly log on and initiate the download. The download speed on the g2.2xlarge instance btw was incredible – you are granted up to 100 MB per second. I started all five downloads in parallel – each going at 6 MB per second. And yes, its mega byte – not mega bit (the unit DSL providers use).

The visible indicators of diabetic retinopathy are as I understand it mostly leaking (aneurysms) and pathologically growing blood vessels. I figure those features are mirror and rotation invariant. So to increase the available training set I created four versions:

(A): As is but resized to 256×256 pixels and saved as PNG
(R): 180 degree rotation of (A)
Vertical mirroring of (A)
Vertical mirroring of (R)

Because the task at hand is obviously not a classification but a regression I abstained from attempting to learn a classification into no DR and the four stages of DR. I labelled all DR cases as “positive” and the no-DR cases respectively as “negative”. This would have to be done for all four possible splits ({0} vs {1,…,4}, …, {0,…,3},{4}) and those predictions would finally be regressed against the actual stage.

The bash script for this transformation you may find on bash commands for the processing.

stay-tuned

The Result

GoogLeNet Well … on one hand I would have liked to see a higher accuracy – on the other hand I can barely (if at all) make out the difference between some healthy cases and some extreme stage four cases. As 73.95% is the share of negative cases – this is also were the accuracy of the network started out at. In the course of 30 epochs it improved about 8 p.p. to 81.8%.

Any Questions?

I highly recommend the DIGITS Google Group for your questions on features and issues. The developers of DIGITS are very helpful and open for suggestions

0 0