GPU上冗余多线程的实际实现

来源:互联网 发布:华为网络机顶盒怎么样 编辑:程序博客网 时间:2024/05/18 01:05

A brief report about paper Real-World Design and Evaluation of Compiler-Managed GPU Redundant Multithreading.


This paper introduces the use of Redundant Multithreading to provide an efficient software solution on GPU. The modification is made in kernel level and have two strategy, Intra-Group RMT and Inter-Group RMT. The paper also proves that GPU RMT performance depends on the unique behaviors of each kernel and the required SoR.


The author's method has three differences from others. First, it assume protection in storage and transfer to off-chip resources and target an on-chip protection domain; Second, the paper focus on detection on the GPU not on the CPU(RMT was originally used in CPU); Third it's on software-only GPU reliability solution, not hardware, since hardware is expensive to implement,inflexible and GPU simulators reflect inappropriate.


So what's RMT? Redundant Multithreading is a little like RAID1, but thread is duplicated, not data. The GPU has a mode called Fault modes, which can cause permanent or transient faults, hard and soft separately. Fault is caused by physical level, which is hard to avoid. But with RMT, the possibility of two faults creating simultaneous identical errors can be ignored. RMT relicate all the values when enter the sphere of replication SoR, and compare the output before a correct copy leave the SoR, just like RAID1. And in this paper RMT is implemented in GPU kernels with OpenCL, so there is a transformation between OpenCL kernels and RMT programs for error detection.


The first strategy is Intra-Group, and there are two different types of Intra-Group RMT, Intre_Group+LDS and Intre_Group-LDS, +LDS means the LDS is in the SoR, so will be deplicated and protected, while -LDS is out and not. Other stuff like scalr register file SRF, scalar uint SU, instruction fetch, decode and scheduling logic are all out of SoR and are not protected by Intra-Group RMT. There are three kernel modifications in Intra-Group : Work-Item ID is modified to create a pair of identical, redundant workitems; LDS is included in the SoR, its allocation and map redundant loads and stores are doubled; communication and output comparison are added as well. The Intra-Group flavors perform not very well, because memory operations DCT and MM spend a lot of time. And for some applications, the inter-work-item communication cost a lot. Of course the behaviour of doubling the size of work-groups takes time too. Although RMT executes twice as many work-items,the power consumption increases is small, less than 2%. And the cost of redundant computation can be hidden behind Intra-Group RMT latency, while Instruction fetch scheduling and decode logic of each CU can be considered inside of the SoR.


Kernel modifications of Inter-Group RMT : adding explicit synchronization to coordinate communication between work-items; modify work-item ID to avoid deadlock; communication buffers are in global memory, for Inter-Group RMT communication between work-times is more expensive than Intra-Group RMT communicaition. The poor performance of Inter-Group RMT is caused by using global memory for inter-work-item communication, which is extremely high. And the CU under-utilization is related to the the work-groups launched.
0 0
原创粉丝点击
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 睾丸皮肤痒破了怎么办 鸡儿下面了蛋痒怎么办? 射精后小腹胀该怎么办 手压伤了有淤血怎么办 手挤压伤了肿了怎么办 手被挤压肿了怎么办 手砸伤了肿了怎么办 手被机器压伤了怎么办 上眼皮眼睛肿了怎么办 上眼皮内有淤血怎么办 种睫毛眼睛红痛怎么办 一只眼睛变红了怎么办 黑眼球缺了一角怎么办 眼镜度数配高了怎么办 孩子近视800度可怎么办 儿童眼睛近视怎么办才能恢复正常 小孩眼睛近视怎么办才能恢复正常 3岁宝宝近视怎么办啊 6个月婴儿近视怎么办 近视眼的人老了怎么办 一千多度的近视怎么办 近视镜片磨花了怎么办 眼镜镜片磨花了怎么办 戴眼镜鼻梁有印怎么办 狗狗发烧怎么办最有效 狗狗感冒怎么办最有效 狗狗发烧去医院怎么办 眼睛里长了虫子怎么办 吃了没熟的猪肉怎么办 没熟的鸡肉吃了怎么办 狗狗大便有绦虫怎么办 吃了有虫的猪肉怎么办 米猪肉吃了会怎么办 吃了鱼的寄生虫怎么办 鱼身体里有线虫怎么办 幼猫半夜一直叫怎么办 猫半夜4点叫不停怎么办 眼睛里长了黄斑怎么办 眼睛周围长小疙瘩怎么办 眼睛被手机砸了怎么办 眼睛被东西砸了怎么办