Threading and Queues -- Tensorflow

来源：互联网发布：台湾观光局数据编辑：程序博客网时间：2024/05/29 04:47

Threading and Queues
- Queue
- Queue的使用
- Coordinator
- QueueRunner
- Handling Exceptions

Threading and Queues

Queue

Queues 用于异步计算
Queues are a powerful mechanism for asynchronous computation using TensorFlow.

一个queue是一个node，结点，其他结点可以通过enqueue来增加元素，通过dequeue来取出元素。
Like everything in TensorFlow, a queue is a node in a TensorFlow graph. It’s a stateful node, like a variable: other nodes can modify its content. In particular, nodes can enqueue new items in to the queue, or dequeue existing items from the queue.

下面建立一个全部是0的FIFO Queue，并取出和增加元素(在queue尾部)。
To get a feel for queues, let’s consider a simple example. We will create a “first in, first out” queue (FIFOQueue) and fill it with zeros. Then we’ll construct a graph that takes an item off the queue, adds one to that item, and puts it back on the end of the queue. Slowly, the numbers on the queue increase.

q = tf.FIFOQueue(3, "float")init = q.enqueue_many([0., 0., 0.],)x = q.dequeue()y = x + 1q_inc = q.enqueue([y])init.run()q_inc.run()q_inc.run()q_inc.run()q_inc.run()

上面的代码，就是取一个0，加1后放回

enqueue, enqueuemany, dequeue都是结点，它们取queue的指针作为内容，因此可以修改queue。可以将它们看作queue的方法。
Enqueue, EnqueueMany, and Dequeue are special nodes. They take a pointer to the queue instead of a normal value, allowing them to change it. We recommend you think of these as being like methods of the queue. In fact, in the Python API, they are methods of the queue object (e.g. q.enqueue(…)).

Queue的使用

在tensorflow中，FIFOQueue，RandomShuffleQueue，都是常用的queue。
Queues, such as tf.FIFOQueue and tf.RandomShuffleQueue, are important TensorFlow objects for computing tensors asynchronously in a graph.

例如，可以用RandomShuffleQueue来构建模型的输入
For example, a typical input architecture is to use a RandomShuffleQueue to prepare inputs for training a model:

用多个线程将训练数据push到queue中
1. Multiple threads prepare training examples and push them in the queue.
用一个线程从quque中取出一个mini-batches大小的数据
2. A training thread executes a training op that dequeues mini-batches from the queue

Tensorflow 的Session 是多线程的
The TensorFlow Session object is multithreaded, so multiple threads can easily use the same session and run ops in parallel. However, it is not always easy to implement a Python program that drives threads as described above. All threads must be able to stop together, exceptions must be caught and reported, and queues must be properly closed when stopping.

Tensorflow提供了两个类来帮助queue，tf.train.Coordinator, tf.train.QueueRunner,它们应该一起使用，Coordinator可以使多个线程都能够停止，并且将异常传给父进程？QueueRunner用于创建多个线程，这些线程一起往同一个queue中添加元素。
TensorFlow provides two classes to help: tf.train.Coordinator and tf.train.QueueRunner. These two classes are designed to be used together. The Coordinator class helps multiple threads stop together and report exceptions to a program that waits for them to stop. The QueueRunner class is used to create a number of threads cooperating to enqueue tensors in the same queue.

Coordinator

Coordinator帮助多个线程一起停止，它包含三个重要方法：
-tf.train.Coordinator.should_stop: returns True if the threads should stop.

-tf.train.Coordinator.request_stop: requests that threads should stop.

-tf.train.Coordinator.join: waits until the specified threads have stopped.

使用中首先创建Coordinator，然后创建一些使用它的线程，这些线程不断循环，直到 should_stop 方法返回True。
You first create a Coordinator object, and then create a number of threads that use the coordinator. The threads typically run loops that stop when should_stop() returns True.

任何一个线程都可以决定整个计算的停止，只需要调用 request_stop 方法即可，这时只要等待其他线程 should_stop() 返回 True 即可。(很方便啊。。。)
Any thread can decide that the computation should stop. It only has to call request_stop() and the other threads will stop as should_stop() will then return True.

# Thread body: loop until the coordinator indicates a stop was requested.# If some condition becomes true, ask the coordinator to stop.def MyLoop(coord):  while not coord.should_stop():    ...do something...    if ...some condition...:      coord.request_stop()# Main thread: create a coordinator.coord = tf.train.Coordinator()# Create 10 threads that run 'MyLoop()'threads = [threading.Thread(target=MyLoop, args=(coord,)) for i in xrange(10)]# Start the threads and wait for all of them to stop.for t in threads:  t.start()coord.join(threads)

QueueRunner

QueueRunner 创建许多重复enqueue op的线程，就是往queue加元素的操作。这些线程使用上面提到的coordinator来一起停止。除此之外，queue runner 维护一个closer thread来自动关掉所有线程，当有异常被coordinator返回。
The QueueRunner class creates a number of threads that repeatedly run an enqueue op. These threads can use a coordinator to stop together. In addition, a queue runner runs a closer thread that automatically closes the queue if an exception is reported to the coordinator.

使用，首先创建一个TensorFlow queue 用来输入。并且提供enqueue，和dequeue方法。train_op使用dequeue的inputs，做一些训练的操作。

example = ...ops to create one example...# Create a queue, and an op that enqueues examples one at a time in the queue.queue = tf.RandomShuffleQueue(...)enqueue_op = queue.enqueue(example)# Create a training graph that starts by dequeuing a batch of examples.inputs = queue.dequeue_many(batch_size)train_op = ...use 'inputs' to build the training part of the graph...

First build a graph that uses a TensorFlow queue (e.g. a tf.RandomShuffleQueue) for input examples. Add ops that process examples and enqueue them in the queue. Add training ops that start by dequeueing from the queue.

使用queue runner来创建多个线程来并行运行，并且enqueue

# Create a queue runner that will run 4 threads in parallel to enqueue# examples.qr = tf.train.QueueRunner(queue, [enqueue_op] * 4)# Launch the graph.sess = tf.Session()# Create a coordinator, launch the queue runner threads.coord = tf.train.Coordinator()enqueue_threads = qr.create_threads(sess, coord=coord, start=True)# Run the training loop, controlling termination with the coordinator.for step in xrange(1000000):    if coord.should_stop():        break    sess.run(train_op)# When done, ask the threads to stop.coord.request_stop()# And wait for them to actually do it.coord.join(enqueue_threads)

In the Python training program, create a QueueRunner that will run a few threads to process and enqueue examples. Create a Coordinator and ask the queue runner to start its threads with the coordinator. Write a training loop that also uses the coordinator.

Handling Exceptions

添加异常处理。

try:    for step in xrange(1000000):        if coord.should_stop():            break        sess.run(train_op)except Exception, e:    # Report exceptions to the coordinator.    coord.request_stop(e)finally:    # Terminate as usual. It is safe to call `coord.request_stop()` twice.    coord.request_stop()    coord.join(threads)

阅读全文

0 0