TensorFlow 多 GPU 处理并行数据
来源:互联网 发布:public cms 编辑:程序博客网 时间:2024/05/23 00:09
Multi-GPU processing with data parallelism
If you write your software in a language like C++ for a single cpu core, making it run on multiple GPUs in parallel would require rewriting the software from scratch. But this is not the case with TensorFlow. Because of its symbolic nature, tensorflow can hide all that complexity, making it effortless to scale your program across many CPUs and GPUs.
Let’s start with the simple example of adding two vectors on CPU:
import tensorflow as tfwith tf.device(tf.DeviceSpec(device_type='CPU', device_index=0)): a = tf.random_uniform([1000, 100]) b = tf.random_uniform([1000, 100]) c = a + btf.Session().run(c)
The same thing can as simply be done on GPU:
with tf.device(tf.DeviceSpec(device_type='GPU', device_index=0)): a = tf.random_uniform([1000, 100]) b = tf.random_uniform([1000, 100]) c = a + b ```But what if we have two GPUs and want to utilize both? To do that, we can split the data and use a separate GPU for processing each half:```pythonsplit_a = tf.split(a, 2)split_b = tf.split(b, 2)split_c = []for i in range(2): with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)): split_c.append(split_a[i] + split_b[i])c = tf.concat(split_c, axis=0) ```Let's rewrite this in a more general form so that we can replace addition with any other set of operations:<div class="se-preview-section-delimiter"></div>```pythondef make_parallel(fn, num_gpus, **kwargs): in_splits = {} for k, v in kwargs.items(): in_splits[k] = tf.split(v, num_gpus) out_split = [] for i in range(num_gpus): with tf.device(tf.DeviceSpec(device_type='GPU', device_index=i)): with tf.variable_scope(tf.get_variable_scope(), reuse=i > 0): out_split.append(fn(**{k : v[i] for k, v in in_splits.items()})) return tf.concat(out_split, axis=0)def model(a, b): return a + bc = make_parallel(model, 2, a=a, b=b)
You can replace the model with any function that takes a set of tensors as input and returns a tensor as result with the condition that both the input and output are in batch. Note that we also added a variable scope and set the reuse to true. This makes sure that we use the same variables for processing both splits. This is something that will become handy in our next example.
Let’s look at a slightly more practical example. We want to train a neural network on multiple GPUs. During training we not only need to compute the forward pass but also need to compute the backward pass (the gradients). But how can we parallelize the gradient computation? This turns out to be pretty easy.
Recall from the first item that we wanted to fit a second degree polynomial to a set of samples. We reorganized the code a bit to have the bulk of the operations in the model function:
import numpy as npimport tensorflow as tfdef model(x, y): w = tf.get_variable("w", shape=[3, 1]) f = tf.stack([tf.square(x), x, tf.ones_like(x)], 1) yhat = tf.squeeze(tf.matmul(f, w), 1) loss = tf.square(yhat - y) return lossx = tf.placeholder(tf.float32)y = tf.placeholder(tf.float32)loss = model(x, y)train_op = tf.train.AdamOptimizer(0.1).minimize( tf.reduce_mean(loss))def generate_data(): x_val = np.random.uniform(-10.0, 10.0, size=100) y_val = 5 * np.square(x_val) + 3 return x_val, y_valsess = tf.Session()sess.run(tf.global_variables_initializer())for _ in range(1000): x_val, y_val = generate_data() _, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val})_, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val})print(sess.run(tf.contrib.framework.get_variables_by_name("w")))
Now let’s use make_parallel that we just wrote to parallelize this. We only need to change two lines of code from the above code:
loss = make_parallel(model, 2, x=x, y=y)train_op = tf.train.AdamOptimizer(0.1).minimize( tf.reduce_mean(loss), colocate_gradients_with_ops=True)
The only thing that we need to change to parallelize backpropagation of gradients is to set the colocate_gradients_with_ops flag to true. This ensures that gradient ops run on the same device as the original op.
更多教程:http://www.tensorflownews.com/
- TensorFlow 多 GPU 处理并行数据
- TensorFlow多GPU并行计算
- tensorflow多gpu并行计算
- TensorFlow多GPU并行的实现
- TensorFlow多GPU并行的实现
- 使用Tensorflow实现多GPU并行训练
- Tensorflow实战学习(四十)【多GPU并行】
- 分享一下三个写的比较好的tensorflow多GPU模型,实现数据并行/模型并行
- tensorflow43《TensorFlow实战》笔记-09-02 多GPU并行 code
- 深度学习TensorFlow如何使用多GPU并行模式?
- Tensorflow深度学习之十八:多GPU并行
- 89、tensorflow使用GPU并行计算
- tensorflow25《TensorFlow实战Google深度学习框架》笔记-10-02 多GPU并行 code
- Tensorflow多GPU
- 多GPU Tensorflow
- tensorflow 多gpu训练
- 利用TESLA GPU和MATLAB实现大规模型数据并行处理
- GPU上并行处理大规模粒子系统
- linux系统启动经历的过程
- 一张图看懂开源许可协议,开源许可证GPL、BSD、MIT、Mozilla、Apache和LGPL的区别
- java中ajax遇到的跨域问题处理
- C#基础复习
- 数据库查询近七天的数据
- TensorFlow 多 GPU 处理并行数据
- idea 中 对于属性<clinit>, 注释org.springframework.web.bind.annotation.CrossOrigin缺少值
- MyBatis实战——resultMap,resultType简介,关联对象
- python中pip安装速度慢的问题
- vmware下设置host-only方式上网----本人亲自测试成功
- Android Studio的Appium的JAVA测试用例
- codeforces679B Bear and Tower of Cubes【DFS】
- Linux的nginx环境的vue 部署
- vue轮播图的实现