【TensorFlow】im2txt — 将图像转为叙述文本
来源:互联网 发布:苹果手机 截图软件 编辑:程序博客网 时间:2024/06/05 22:49
完整项目已上传 Github —— im2txt
不过模型需要单独下载,Github 免费版不能上传大于100M的文件
下载 im2txt
tensorflow/models 下面有很多模型,但是我们只需要 im2txt,不过在 Github 上面下载子文件夹很麻烦,所以还是下载整个 models,也许以后会用到其他的模型
git clone https://github.com/tensorflow/models.git
下载好了之后将 models/research/im2txt/im2txt 文件夹复制到你的工作区
安装必要的包
首先按照 Github 上 im2txt 的说明,安装所有必需的包
- Bazel (方法)
- TensorFlow 1.0或更高版本 (方法)
- NumPy (方法)
- Natural Language Toolkit (NLTK)
- 首先安装 NLTK (方法)
- 然后下载 NLTK 数据 (方法)
下载模型和词汇
如果要自己训练模型,按照官网的说法,需要先下载几个小时的数据集,然后再训练1~2周,最后还要精调几个星期
训练要花不少时间,所以用训练好的模型,下载地址是
- 原地址 (如果有VPN)
- 网盘 密码:9bun (速度可能被恶意限制)
- Github (免费版不能上传大于100M的文件)
下载之后放在 im2txt/model 文件夹下
im2txt/ ...... model/ graph.pbtxt model.ckpt-2000000 model.ckpt-2000000.meta
同时下载包含词语的文件 word_counts.txt,下载好之后放在 data 文件夹下
im2txt/ ...... data/ ...... word_counts.txt
编写脚本
在 im2txt 文件夹下新建一个 run.sh 脚本文件,输入以下命令
CHECKPOINT_PATH="${HOME}/im2txt/model/train"VOCAB_FILE="${HOME}/im2txt/data/mscoco/word_counts.txt"IMAGE_FILE="${HOME}/im2txt/data/mscoco/raw-data/val2014/COCO_val2014_000000224477.jpg"bazel build -c opt //im2txt:run_inferencebazel-bin/im2txt/run_inference \ --checkpoint_path=${CHECKPOINT_PATH} \ --vocab_file=${VOCAB_FILE} \ --input_files=${IMAGE_FILE}
其中的变量用自己的路径代替,比如我当前设置的路径
CHECKPOINT_PATH="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/model.ckpt-2000000"VOCAB_FILE="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/word_counts.txt"IMAGE_FILE="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/images/1.jpg"bazel build -c opt run_inferencebazel-bin/im2txt/run_inference \ --checkpoint_path=${CHECKPOINT_PATH} \ --vocab_file=${VOCAB_FILE} \ --input_files=${IMAGE_FILE}
运行脚本
将当前工作目录设置为 im2txt,设置脚本的权限
sudo chmod 777 run.sh
然后将工作目录设置为 im2txt 的上层目录,运行脚本
./im2txt/run.sh
输出结果如下,感觉结果还不错
INFO: Analysed target //im2txt:run_inference (0 packages loaded).INFO: Found 1 target...Target //im2txt:run_inference up-to-date: bazel-bin/im2txt/run_inferenceINFO: Elapsed time: 0.164s, Critical Path: 0.01sINFO: Build completed successfully, 1 total actionINFO:tensorflow:Building model.INFO:tensorflow:Initializing vocabulary from file: /home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/word_counts.txtINFO:tensorflow:Created vocabulary with 11520 wordsINFO:tensorflow:Running caption generation on 1 files matching /home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/images/1.jpgINFO:tensorflow:Loading model from checkpoint: /home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/newmodel.ckpt-2000000INFO:tensorflow:Restoring parameters from /home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/newmodel.ckpt-2000000INFO:tensorflow:Successfully loaded checkpoint: newmodel.ckpt-2000000Captions for image 1.jpg: 0) a man riding a wave on top of a surfboard . (p=0.035667) 1) a person riding a surf board on a wave (p=0.016235) 2) a man on a surfboard riding a wave . (p=0.010144)
同时 bazel build 命令会在 WORKSPACE 的同级目录下生成一些文件夹
bazel-bin/bazel-genfiles/bazel-out/bazel-testlogs/......
而 bazel-bin 下就是编译好的 run_inference,会在 run.sh 中被调用
错误总结
1. build 错误
在执行 run.sh 时,bazel 的 build 命令只能运行在工作目录下
ERROR: The 'build' command is only supported from within a workspace.
解决方法是,在执行 run.sh 的目录下新建一个 WORKSPACE
touch WORKSPACE
2. 找不到 im2txt 包
在执行 run.sh 时,出现找不到 im2txt 包的错误
ERROR: Skipping '//im2txt:run_inference': no such package 'im2txt': BUILD file not found on package pathWARNING: Target pattern parsing failed.ERROR: no such package 'im2txt': BUILD file not found on package pathINFO: Elapsed time: 0.107sFAILED: Build did NOT complete successfully (0 packages loaded)./run.sh: 9: ./run.sh: bazel-bin/im2txt/run_inference: not found
这是因为没有在 im2txt 的上层目录执行,解决方法是在 im2txt 的上层目录执行 run.sh 脚本
或者直接在 run.sh 添加一句返回上层目录的命令
CHECKPOINT_PATH="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/model.ckpt-2000000"VOCAB_FILE="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/word_counts.txt"IMAGE_FILE="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/data/images/1.jpg"cd .. # 返回上层目录bazel build -c opt run_inferencebazel-bin/im2txt/run_inference \ --checkpoint_path=${CHECKPOINT_PATH} \ --vocab_file=${VOCAB_FILE} \ --input_files=${IMAGE_FILE}
然后直接在 run.sh 的当前目录下执行
./run.sh
3. 找不到 lstm/basic_lstm_cell/×××
运行 run.sh 时,TensorFlow 在模型中找不到 lstm/basic_lstm_cell/×××
# 错误1NotFoundError: Tensor name "lstm/basic_lstm_cell/bias" not foundin checkpoint files# 错误2NotFoundError: Key lstm/basic_lstm_cell/kernel not found in checkpoint
这是因为 TF1.0 和 TF1.2 的 LSTM 在命名上出现了差异,TF1.0 之前的命名跟 TF1.0 也不一样,所以需要根据错误信息自己修改
解决方式是,新建 rename_ckpt.py 文件,使用输入以下方法将原有训练模型转化
import tensorflow as tfdef rename_ckpt(): # 由于 TensorFlow 的版本不同,所以要根据具体错误信息进行修改 vars_to_rename = { "lstm/BasicLSTMCell/Linear/Bias": "lstm/basic_lstm_cell/bias", "lstm/BasicLSTMCell/Linear/Matrix": "lstm/basic_lstm_cell/kernel" } new_checkpoint_vars = {} reader = tf.train.NewCheckpointReader( "/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/model.ckpt-2000000" ) for old_name in reader.get_variable_to_shape_map(): if old_name in vars_to_rename: new_name = vars_to_rename[old_name] else: new_name = old_name new_checkpoint_vars[new_name] = tf.Variable( reader.get_tensor(old_name)) init = tf.global_variables_initializer() saver = tf.train.Saver(new_checkpoint_vars) with tf.Session() as sess: sess.run(init) saver.save( sess, "/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/newmodel.ckpt-2000000" ) print("checkpoint file rename successful... ")if __name__ == '__main__': rename_ckpt()
运行 rename_ckpt.py 脚本,成功修改之后的结果如下
$ python rename_ckpt.pycheckpoint file rename successful...
此时,model 文件夹下会出现几个新的文件
model/ ...... checkpoint newmodel.ckpt-2000000.data-00000-of-00001 newmodel.ckpt-2000000.index newmodel.ckpt-2000000.meta
同时还要将 run.sh 脚本中的 CHECKPOINT_PATH 改成修改后的 ckpt 文件
CHECKPOINT_PATH="/home/w/workspace/tensorflow-space/tensorflow-gpu/practices/im2txt/model/newmodel.ckpt-2000000"
4. 读取图片错误
运行 run.sh 时,出现编码的错误信息,而错误追踪信息表明是读取图片时发生的错误
Traceback (most recent call last): File "/home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/bazel-bin/im2txt/run_inference.runfiles/__main__/im2txt/run_inference.py", line 85, in <module> tf.app.run() File "/home/widiot/workspace/tensorflow-space/tensorflow-gpu/venv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 129, in run _sys.exit(main(argv)) File "/home/widiot/workspace/tensorflow-space/tensorflow-gpu/practices/bazel-bin/im2txt/run_inference.runfiles/__main__/im2txt/run_inference.py", line 74, in main image = f.read()......'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
解决方法是修改一下打开图片的方式,我出现错误的文件是 run_inference.py
for filename in filenames: with tf.gfile.GFile(filename, "r") as f: image = f.read()# 第73行修改为for filename in filenames: with tf.gfile.GFile(filename, "rb") as f: image = f.read()
该错误地址为 issue
5. 输出相同叙述文本
执行 run.sh 脚本后,输出的结果全是一样的叙述文本,并且后面还有很多 .
和 <S>
......Captions for image 1.jpg: 0) a man riding a wave on top of a surfboard . <S> . <S> . <S> <S> . (p=0.001145) 1) a man riding a wave on top of a surfboard . <S> . <S> <S> . <S> <S> (p=0.000888) 2) a man riding a wave on top of a surfboard . <S> . <S> <S> . <S> <S> (p=0.000658)
我查看了代码,发现 caption_generator.py 脚本中有判断是不是结束符 </S>
的语句
......# 第194行if w == self.vocab.end_id: if self.length_normalization_factor > 0:......
而这一行代码的结果始终为 False,我将 w 的值和 end_id 的值对比发现 w=2,而 end_id=3
然后我去查看 word_counts.txt,发现 <S>
的位置为 2,</S>
的位置为 3,跟代码中模型的输出不一样
a 969108<S> 586368</S> 586368. 440479on 213612of 202290......
将这两个字符调换位置,重新运行 run.sh,结果就正常了
......Captions for image 1.jpg: 0) a man riding a wave on top of a surfboard . (p=0.035667) 1) a person riding a surf board on a wave (p=0.016235) 2) a man on a surfboard riding a wave . (p=0.010144)
这个 word_counts.txt 是我找的别人已有的,没想到还能有这样的错误,真是刷新我对 BUG 的认知
- 【TensorFlow】im2txt — 将图像转为叙述文本
- 基于tensorflow 1.0的图像叙事功能测试(model/im2txt)
- tensorflow im2txt模型
- TensorFlow & im2txt学习笔记(一)
- tensorflow将CSV文件转为TFrecords文件
- 【TensorFlow】数据处理(将MNIST转为TFRecord)
- 使用Python将文本转为图片
- iOS 将text文本转为image
- python 将文本内容转为字典
- tensorflow--使用freeze_graph.py将ckpt转为pb文件
- dfm格式转换: 将二进制格式转为文本格式
- 将dos格式换行文本转为Unix格式
- excel中如何将文本格式数字转为数字格式
- TTS(TextToSpeech)将文本转为语音的简单使用
- TextToSpeech将文本转为语音的简单使用
- TextToSpeech将文本转为语音的简单使用
- matlab将视频序列转为图像的方法
- 将yuyv格式图像转为IplImage(彩色)
- Linux下编译安装log4cxx
- spring--核心技术--面向切面编程
- jQuery
- Spring Boot 事务的使用
- 多维空间内过 n + 1 个点的空间的性质
- 【TensorFlow】im2txt — 将图像转为叙述文本
- Spring Boot 部署与服务配置
- elasticsearch 安装 x-pack 后重新生成密码的方法
- 如果你有一个机器人女友
- tomcat源码解读五 Tomcat中Request的生命历程
- hdu1010(DFS + 奇偶剪枝)
- Angular JS常用技术网站
- Linux计划任务命令之crontab 命令
- Django Caching