Unsuccessful TensorSliceReader constructor: Failed to get matching files

来源:互联网 发布:库存表软件 编辑:程序博客网 时间:2024/06/05 21:14

在训练分布式 tensorflow 时遇到问题:

W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Unsuccessful TensorSliceReader constructor: Failed to get matching files on /tmp/train_logs/model.ckpt-1: Not found: /tmp/train_logs

说是没有找到 /tmp/train_logs 目录,加载不到文件,但是路径和文件名没有错,去目录下查找也有文件

出错的一句是

sv = tf.train.Supervisor(is_chief=(FLAGS.task_index == 0), logdir="/tmp/train_logs", init_op=init_op, summary_op=summary_op, saver=saver, global_step=global_step, save_model_secs=600)


解决:https://github.com/tensorflow/tensorflow/issues/6082 

@ppwwyyxx

Could you try the following:

  1. use a model name without the character [], and:
  2. when you tried to restore, use the full relative path ./model_epoch10 rather than model_epoch10

If my guess is right then you should see it work. It it doesn't, could you give the steps so that people can reproduce the problem (i.e. how to use your code).


最后我在当前目录下新建了一个目录 checkpoint ,然后将文件中出错的一句改为:

sv = tf.train.Supervisor(is_chief=(FLAGS.task_index == 0), logdir="./checkpoint/", init_op=init_op, summary_op=summary_op, saver=saver, global_step=global_step, save_model_secs=600)



0 0