TensorFlow下运行Google的Im2txt：show and tell inception v3

来源：互联网发布：html引导页源码编辑：程序博客网时间：2024/06/05 21:02

我的设备：ubuntu14.04+GPU

TensorFlow1.0.1

相关论文《Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge》

https://arxiv.org/abs/1609.06647

去年9月刚开源的

github：https://github.com/tensorflow/models/tree/master/im2txt#generating-captions

根据GitHub的readme

先安装相关东西

Bazel

根据官网

$echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list

$curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -

$sudo apt-get update&& sudo apt-get install bazel

报错：

有一些软件包无法被安装。如果您用的是 unstable发行版，这也许是因为系统无法达到您要求的状态造成的。该版本中可能会有一些您需要的软件

包尚未被创建或是它们已被从新到(Incoming)目录移出。

下列信息可能会对解决问题有所帮助：

下列软件包有未满足的依赖关系：

 bazel : 依赖: google-jdk 但无法安装它 或

                 java8-jdk但无法安装它或

                 java8-sdk但无法安装它或

                 oracle-java8-installer但无法安装它

E: 无法修正错误，因为您要求某些软件包保持现状，就是它们破坏了软件包间的依赖关系。

试了网上的无数方法，各种换源都没用，直到我看到官网的一行字：

If you want to use the JDK 7, please replace jdk1.8 with jdk1.7 and if you want to install the testing version of Bazel, replace stable with testing.

应该是因为我的系统是ubuntu14.04，所以用的jdk7

$ update-java-alternatives -l
#java-1.7.0-openjdk-amd641071 /usr/lib/jvm/java-1.7.0-openjdk-amd64

继续按照官网

$echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.7" | sudo tee /etc/apt/sources.list.d/bazel.list

$curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -

$sudo apt-get update&& sudo apt-get install bazel

$sudo apt-get upgrade bazel

检查自己是否安装好了

$/usr/bin/bazel version

NumPy

安装官方文档

https://www.scipy.org/install.html

$python -m pip install --upgrade pip

$pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose

测试：

$python

>>>import scipy

>>>import numpy

>>>scipy.test()

>>>numpy.test()

网上说也可以这么装，不懂跟GitHub上链接的网址有什么不同

$sudo apt-get install Python-scipy

$sudo apt-get install python-numpy

$sudo apt-get install python-matplotlib

Natural Language Toolkit (NLTK):

首先安装NLTK

http://www.nltk.org/install.html

$sudo pip install -U nltk

$sudo pip install -U numpy

$python

>>> import nltk

然后安装NLTK数据

http://www.nltk.org/data.html

$sudo python

>>> import nltk

>>> nltk.download()

设定目录为/usr/share/nltk_data

测试数据已安装

>>> from nltk.corpus import brown

>>> brown.words()

['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]

这步完成的时候最后再运行一下：

>>>import nltk
>>>nltk.download('punkt')
以防下载和预处理的时候遇到以下问题：

LookupError:
**********************************************************************
 Resource u'tokenizers/punkt/english.pickle' not found.  Please
  usethe NLTK Downloader to obtain the resource: >>>
 nltk.download()
 Searched in:
    -'/home/ubuntu/nltk_data'
    -'/usr/share/nltk_data'
    -'/usr/local/share/nltk_data'
    -'/usr/lib/nltk_data'
    -'/usr/local/lib/nltk_data'
    -u''
**********************************************************************

预处理

# Location to save the MSCOCO data.MSCOCO_DIR="${HOME}/im2txt/data/mscoco"# Build the preprocessing script.bazel build im2txt/download_and_preprocess_mscoco# Run the preprocessing script.bazel-bin/im2txt/download_and_preprocess_mscoco "${MSCOCO_DIR}"

这步比较简单，但是网络不好的话会经常蜜汁重下，我就下了好几次，每次时间还特别久，总之真是令人崩溃

看到这个就说明预处理成功啦

训练

$ MSCOCO_DIR="/path/to/MSCOCO"

$ INCEPTION_CHECKPOINT="/path/to/inception_v3.ckpt"

$ MODEL_DIR="/path/to/models/im2txt/model"

$ bazel build -c opt im2txt/...

$ bazel-bin/im2txt/train \

> --input_file_pattern="${MSCOCO_DIR}/train-?????-of-00256" \

> --inception_checkpoint_file="${INCEPTION_CHECKPOINT}" \

> --train_dir="${MODEL_DIR}/train" \

> --train_inception=false \

> --number_of_steps=1000000

报错：

AttributeError: 'module' object has no attribute '_base'

解决方法：

$ pip install --upgrade html5lib==1.0b8

解决后模型训练中：

接下来就是等待啦，网上说需要一到两周

我训练了三四天的样子就训练完了

接下来就是精调，在此之前测试一下训练的效果

想偷懒的小伙伴也可以跳过训练的步骤，直接用我训练的模型

用现有的模型

直接用我训练好的模型

https://github.com/withyou1771/im2txt

$CHECKPOINT_PATH="/path/to/model.ckpt-1000000"
$VOCAB_FILE="/path/to/word_counts.txt"
$IMAGE_FILE="/path/to/models/im2txt/1.jpg"
$bazel build -c opt im2txt/run_inference
$bazel-bin/im2txt/run_inference \ --checkpoint_path=${CHECKPOINT_PATH} \
 --vocab_file=${VOCAB_FILE} \
 --input_files=${IMAGE_FILE}

$ CHECKPOINT_PATH="/path/to/model.ckpt-1000000"
$ IMAGE_FILE="/path/to/1.jpg"
$ VOCAB_FILE="/path/to/word_counts.txt"
$ bazel build -c opt im2txt/run_inference
INFO: Found 1 target...
Target //im2txt:run_inference up-to-date:
bazel-bin/im2txt/run_inference
INFO: Elapsed time: 0.138s, Critical Path: 0.00s
(tensorflow)ubuntu@ubuntu-All-Series:/home/data1/tf/models/im2txt$ bazel-bin/im2txt/run_inference --checkpoint_path=${CHECKPOINT_PATH} --vocab_file=${VOCAB_FILE} --input_files=${IMAGE_FILE}
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
INFO:tensorflow:Building model.
INFO:tensorflow:Initializing vocabulary from file: /word_counts.txt
INFO:tensorflow:Created vocabulary with 11520 words
INFO:tensorflow:Running caption generation on 1 files matching/1.jpg
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:0a:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x541a970
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:09:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x541e2f0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:06:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x5421c70
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:05:00.0
Total memory: 7.92GiB
Free memory: 7.57GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:0a:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:09:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX 1080, pci bus id: 0000:06:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX 1080, pci bus id: 0000:05:00.0)
INFO:tensorflow:Loading model from checkpoint: /home/data1/tf/model4/model.ckpt-1000000
INFO:tensorflow:Successfully loaded checkpoint: model.ckpt-1000000
Captions for image 1.jpg:
  0)a cat laying on top of a grass covered field . (p=0.002806)
  1)a black and white cat laying on top of a grass covered field . (p=0.000498)
  2)a black and white cat laying on top of a green field . (p=0.000412)

处理一张照片大约需要8秒，大部分时间都用在加载cuda

可以同时处理多张图片，路径用逗号隔开就可以。同时处理5张图片需要26秒，平均每张5秒左右

由于python和TensorFlow版本的不同，即使是用别人的模型也可能遇到很多错误

报错一：
ValueError: No checkpoint file found in: None
word_counts.txt文件的格式不对
替换vocabulary.py第49行
reverse_vocab = [eval(line.split()[0]) for line in reverse_vocab]

报错二：
NotFoundError (see above for traceback): Tensor name "lstm/basic_lstm
ts" not found in checkpoint files /home/data1/tf/model2/model.ckpt-30
[[Node: save/RestoreV2_381 = RestoreV2[dtypes=[DT_FLOAT], _d
:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV
r_names, save/RestoreV2_381/shape_and_slices)]]
[[Node: save/RestoreV2_295/_177 = _Recv[client_terminated=fa
evice="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:loca
ca:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_1333
reV2_295", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/ta
]()]]

TensorFlow1.0中BasicLSTMCell更改了默认变量名，不再匹配检查点。
相关的资料
https://www.tensorflow.org/install/migration

0 0