【TensorFlow】im2txt — 将图像转为叙述文本

来源:互联网 发布:苹果手机 截图软件 编辑:程序博客网 时间:2024/06/05 22:49

完整项目已上传 Github —— im2txt
不过模型需要单独下载,Github 免费版不能上传大于100M的文件

下载 im2txt


tensorflow/models 下面有很多模型,但是我们只需要 im2txt,不过在 Github 上面下载子文件夹很麻烦,所以还是下载整个 models,也许以后会用到其他的模型

git clone https://github.com/tensorflow/models.git

下载好了之后将 models/research/im2txt/im2txt 文件夹复制到你的工作区

安装必要的包


首先按照 Github 上 im2txt 的说明,安装所有必需的包

  • Bazel (方法)
  • TensorFlow 1.0或更高版本 (方法)
  • NumPy (方法)
  • Natural Language Toolkit (NLTK)
    • 首先安装 NLTK (方法)
    • 然后下载 NLTK 数据 (方法)

下载模型和词汇


如果要自己训练模型,按照官网的说法,需要先下载几个小时的数据集,然后再训练1~2周,最后还要精调几个星期

训练要花不少时间,所以用训练好的模型,下载地址是

  • 原地址 (如果有VPN)
  • 网盘 密码:9bun (速度可能被恶意限制)
  • Github (免费版不能上传大于100M的文件)

下载之后放在 im2txt/model 文件夹下

im2txt/    ......    model/        graph.pbtxt        model.ckpt-2000000        model.ckpt-2000000.meta

同时下载包含词语的文件 word_counts.txt,下载好之后放在 data 文件夹下

im2txt/    ......    data/        ......        word_counts.txt

编写脚本


在 im2txt 文件夹下新建一个 run.sh 脚本文件,输入以下命令

CHECKPOINT_PATH="${HOME}/im2txt/model/train"VOCAB_FILE="${HOME}/im2txt/data/mscoco/word_counts.txt"IMAGE_FILE="${HOME}/im2txt/data/mscoco/raw-data/val2014/COCO_val2014_000000224477.jpg"bazel build -c opt //im2txt:run_inferencebazel-bin/im2txt/run_inference \  --checkpoint_path=${CHECKPOINT_PATH} \  --vocab_file=${VOCAB_FILE} \  --input_files=${IMAGE_FILE}

其中的变量用自己的路径代替,比如我当前设置的路径

CHECKPOINT_PATH="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/model.ckpt-2000000"VOCAB_FILE="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/word_counts.txt"IMAGE_FILE="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/images/1.jpg"bazel build -c opt run_inferencebazel-bin/im2txt/run_inference \  --checkpoint_path=${CHECKPOINT_PATH} \  --vocab_file=${VOCAB_FILE} \  --input_files=${IMAGE_FILE}

运行脚本


将当前工作目录设置为 im2txt,设置脚本的权限

sudo chmod 777 run.sh

然后将工作目录设置为 im2txt 的上层目录,运行脚本

./im2txt/run.sh

输出结果如下,感觉结果还不错

INFO: Analysed target //im2txt:run_inference (0 packages loaded).INFO: Found 1 target...Target //im2txt:run_inference up-to-date:  bazel-bin/im2txt/run_inferenceINFO: Elapsed time: 0.164s, Critical Path: 0.01sINFO: Build completed successfully, 1 total actionINFO:tensorflow:Building model.INFO:tensorflow:Initializing vocabulary from file: /home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/word_counts.txtINFO:tensorflow:Created vocabulary with 11520 wordsINFO:tensorflow:Running caption generation on 1 files matching /home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/images/1.jpgINFO:tensorflow:Loading model from checkpoint: /home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/newmodel.ckpt-2000000INFO:tensorflow:Restoring parameters from /home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/newmodel.ckpt-2000000INFO:tensorflow:Successfully loaded checkpoint: newmodel.ckpt-2000000Captions for image 1.jpg:  0) a man riding a wave on top of a surfboard . (p=0.035667)  1) a person riding a surf board on a wave (p=0.016235)  2) a man on a surfboard riding a wave . (p=0.010144)

同时 bazel build 命令会在 WORKSPACE 的同级目录下生成一些文件夹

bazel-bin/bazel-genfiles/bazel-out/bazel-testlogs/......

而 bazel-bin 下就是编译好的 run_inference,会在 run.sh 中被调用

错误总结


1. build 错误

在执行 run.sh 时,bazel 的 build 命令只能运行在工作目录下

ERROR: The 'build' command is only supported from within a workspace.

解决方法是,在执行 run.sh 的目录下新建一个 WORKSPACE

touch WORKSPACE

2. 找不到 im2txt 包

在执行 run.sh 时,出现找不到 im2txt 包的错误

ERROR: Skipping '//im2txt:run_inference': no such package 'im2txt': BUILD file not found on package pathWARNING: Target pattern parsing failed.ERROR: no such package 'im2txt': BUILD file not found on package pathINFO: Elapsed time: 0.107sFAILED: Build did NOT complete successfully (0 packages loaded)./run.sh: 9: ./run.sh: bazel-bin/im2txt/run_inference: not found

这是因为没有在 im2txt 的上层目录执行,解决方法是在 im2txt 的上层目录执行 run.sh 脚本

或者直接在 run.sh 添加一句返回上层目录的命令

CHECKPOINT_PATH="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/model.ckpt-2000000"VOCAB_FILE="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/word_counts.txt"IMAGE_FILE="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/images/1.jpg"cd .. # 返回上层目录bazel build -c opt run_inferencebazel-bin/im2txt/run_inference \  --checkpoint_path=${CHECKPOINT_PATH} \  --vocab_file=${VOCAB_FILE} \  --input_files=${IMAGE_FILE}

然后直接在 run.sh 的当前目录下执行

./run.sh

3. 找不到 lstm/basic_lstm_cell/×××

运行 run.sh 时,TensorFlow 在模型中找不到 lstm/basic_lstm_cell/×××

# 错误1NotFoundError: Tensor name "lstm/basic_lstm_cell/bias" not foundin checkpoint files# 错误2NotFoundError: Key lstm/basic_lstm_cell/kernel not found in checkpoint

这是因为 TF1.0 和 TF1.2 的 LSTM 在命名上出现了差异,TF1.0 之前的命名跟 TF1.0 也不一样,所以需要根据错误信息自己修改

TF1.0 TF1.2 lstm/basic_lstm_cell/weights lstm/basic_lstm_cell/kernel lstm/basic_lstm_cell/biases lstm/basic_lstm_cell/bias


解决方式是,新建 rename_ckpt.py 文件,使用输入以下方法将原有训练模型转化

import tensorflow as tfdef rename_ckpt():    # 由于 TensorFlow 的版本不同,所以要根据具体错误信息进行修改    vars_to_rename = {        "lstm/BasicLSTMCell/Linear/Bias": "lstm/basic_lstm_cell/bias",        "lstm/BasicLSTMCell/Linear/Matrix": "lstm/basic_lstm_cell/kernel"    }    new_checkpoint_vars = {}    reader = tf.train.NewCheckpointReader(        "/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/model.ckpt-2000000"    )    for old_name in reader.get_variable_to_shape_map():        if old_name in vars_to_rename:            new_name = vars_to_rename[old_name]        else:            new_name = old_name        new_checkpoint_vars[new_name] = tf.Variable(            reader.get_tensor(old_name))    init = tf.global_variables_initializer()    saver = tf.train.Saver(new_checkpoint_vars)    with tf.Session() as sess:        sess.run(init)        saver.save(            sess,            "/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/newmodel.ckpt-2000000"        )    print("checkpoint file rename successful... ")if __name__ == '__main__':    rename_ckpt()

运行 rename_ckpt.py 脚本,成功修改之后的结果如下

$ python rename_ckpt.pycheckpoint file rename successful...

此时,model 文件夹下会出现几个新的文件

model/    ......    checkpoint    newmodel.ckpt-2000000.data-00000-of-00001    newmodel.ckpt-2000000.index    newmodel.ckpt-2000000.meta

同时还要将 run.sh 脚本中的 CHECKPOINT_PATH 改成修改后的 ckpt 文件

CHECKPOINT_PATH="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/newmodel.ckpt-2000000"

4. 读取图片错误

运行 run.sh 时,出现编码的错误信息,而错误追踪信息表明是读取图片时发生的错误

Traceback (most recent call last):  File "/home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/bazel-bin/im2txt/run_inference.runfiles/__main__/im2txt/run_inference.py", line 85, in <module>    tf.app.run()  File "/home/widiot/workspace/tensorflow-space/tensorflow-gpu/venv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 129, in run    _sys.exit(main(argv))  File "/home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/bazel-bin/im2txt/run_inference.runfiles/__main__/im2txt/run_inference.py", line 74, in main    image = f.read()......'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

解决方法是修改一下打开图片的方式,我出现错误的文件是 run_inference.py

for filename in filenames:      with tf.gfile.GFile(filename, "r") as f:        image = f.read()# 第73行修改为for filename in filenames:      with tf.gfile.GFile(filename, "rb") as f:        image = f.read()

该错误地址为 issue

5. 输出相同叙述文本

执行 run.sh 脚本后,输出的结果全是一样的叙述文本,并且后面还有很多 .<S>

......Captions for image 1.jpg:  0) a man riding a wave on top of a surfboard . <S> . <S> . <S> <S> . (p=0.001145)  1) a man riding a wave on top of a surfboard . <S> . <S> <S> . <S> <S> (p=0.000888)  2) a man riding a wave on top of a surfboard . <S> . <S> <S> . <S> <S> (p=0.000658)

我查看了代码,发现 caption_generator.py 脚本中有判断是不是结束符 </S> 的语句

......# 第194行if w == self.vocab.end_id:                        if self.length_normalization_factor > 0:......

而这一行代码的结果始终为 False,我将 w 的值和 end_id 的值对比发现 w=2,而 end_id=3

然后我去查看 word_counts.txt,发现 <S> 的位置为 2,</S> 的位置为 3,跟代码中模型的输出不一样

a 969108<S> 586368</S> 586368. 440479on 213612of 202290......

将这两个字符调换位置,重新运行 run.sh,结果就正常了

......Captions for image 1.jpg:  0) a man riding a wave on top of a surfboard . (p=0.035667)  1) a person riding a surf board on a wave (p=0.016235)  2) a man on a surfboard riding a wave . (p=0.010144)

这个 word_counts.txt 是我找的别人已有的,没想到还能有这样的错误,真是刷新我对 BUG 的认知