GPU上冗余多线程的实际实现
来源:互联网 发布:华为网络机顶盒怎么样 编辑:程序博客网 时间:2024/05/18 01:05
A brief report about paper Real-World Design and Evaluation of Compiler-Managed GPU Redundant Multithreading.
The author's method has three differences from others. First, it assume protection in storage and transfer to off-chip resources and target an on-chip protection domain; Second, the paper focus on detection on the GPU not on the CPU(RMT was originally used in CPU); Third it's on software-only GPU reliability solution, not hardware, since hardware is expensive to implement,inflexible and GPU simulators reflect inappropriate.
So what's RMT? Redundant Multithreading is a little like RAID1, but thread is duplicated, not data. The GPU has a mode called Fault modes, which can cause permanent or transient faults, hard and soft separately. Fault is caused by physical level, which is hard to avoid. But with RMT, the possibility of two faults creating simultaneous identical errors can be ignored. RMT relicate all the values when enter the sphere of replication SoR, and compare the output before a correct copy leave the SoR, just like RAID1. And in this paper RMT is implemented in GPU kernels with OpenCL, so there is a transformation between OpenCL kernels and RMT programs for error detection.
The first strategy is Intra-Group, and there are two different types of Intra-Group RMT, Intre_Group+LDS and Intre_Group-LDS, +LDS means the LDS is in the SoR, so will be deplicated and protected, while -LDS is out and not. Other stuff like scalr register file SRF, scalar uint SU, instruction fetch, decode and scheduling logic are all out of SoR and are not protected by Intra-Group RMT. There are three kernel modifications in Intra-Group : Work-Item ID is modified to create a pair of identical, redundant workitems; LDS is included in the SoR, its allocation and map redundant loads and stores are doubled; communication and output comparison are added as well. The Intra-Group flavors perform not very well, because memory operations DCT and MM spend a lot of time. And for some applications, the inter-work-item communication cost a lot. Of course the behaviour of doubling the size of work-groups takes time too. Although RMT executes twice as many work-items,the power consumption increases is small, less than 2%. And the cost of redundant computation can be hidden behind Intra-Group RMT latency, while Instruction fetch scheduling and decode logic of each CU can be considered inside of the SoR.
Kernel modifications of Inter-Group RMT : adding explicit synchronization to coordinate communication between work-items; modify work-item ID to avoid deadlock; communication buffers are in global memory, for Inter-Group RMT communication between work-times is more expensive than Intra-Group RMT communicaition. The poor performance of Inter-Group RMT is caused by using global memory for inter-work-item communication, which is extremely high. And the CU under-utilization is related to the the work-groups launched.
0 0
- GPU上冗余多线程的实际实现
- AES加密算法在GPU上的实现
- 多线程的实际应用
- 基于移动平台消除冗余GPU绘制片段的技术
- GPU上大规模粒子系统的Dx9实现细节总结
- Cisco路由技术:在路由器上配置Track,实现静态路由器的冗余1
- 使用pthread实现多线程CPU+GPU计算
- soft iscsi实现冗余的方式
- CRC冗余校验码的介绍和实现
- CRC冗余校验的C语言实现
- Linux下多网卡冗余的实现
- 《GPU高性能编程CUDA实战》—— 《笔记一》——使用线程实现GPU上的矢量求和
- 在GPU上实现光线跟踪
- C++多线程——三种线程实现方式的区别与实际应用建议
- GPU上基于SIMD的实现模式与多核CPU上基于MIMD的实现模式各有什么优缺点
- Java多线程的两种实际应用
- JAVA多线程之Future的实际使用
- 面向GPU基于Vertex Texture Fetch的细分曲面构想和初步实现 上
- Android Studio中Junit调试
- sklearn学习笔记(1)--make_blobs
- 虚拟机的安装
- 查询SQL数据库中的schema
- 【安卓随笔】使用OpenCV进行人脸跟踪和自动拍照
- GPU上冗余多线程的实际实现
- angular 倒序绑定数据
- springWeb返回Json数据的处理
- 编译警告:ISO C90 forbids mixed declarations and code
- mysql 中二进制日志
- ZXing生成和读取条形码二维码(C#)
- 微信开发 (一) 消息回复
- 一次启动tomcat失败的案例
- node-oracledb 安装