Nvidia GPU 的存储架构 发展
来源:互联网 发布:普通话练习软件下载 编辑:程序博客网 时间:2024/05/24 05:41
查阅了好多论文,以及英伟达的白皮书,最后终于搞明白了。。
从Fermi 到Pascal,cache 的体系结构发生了变化;
1. Fermi
- L1dcahce 是与 Shared mem 可配置的64kB的大小,一般为 16/48 or 48/16,可读可写 ;
- 还有专有的对图像渲染的texture cache 和 存放常量的constant cache,只读;
- 以上L1层的cache是对SM 私有的;为了保证cache coherence 的问题,l1dcache 的写请求 也不会被cache了
综上,l1层 总是只读的
l2对于所有sm共享,可读可写
- 当l2中的数据被写,恰好l1中还存在这个数据,那么将l1中这个数据 使失效,保持了cache coherency;
2. Kepler
cache 层基本继承于Fermi架构,其对于Fermi架构的新的特性就是增加了48KB的READ-ONLY DATA-CACHE.
专门用来缓存只读的数据。
其他同上
3.Maxwell
这一次 英伟达有了一次较大的改变,,完全放弃了在L1层的写,将l1d 与 tex 等专用cache 进行统一;
根据workload 的不同进行选择。
global loads are cached in L2 only
local loads are cached in L2 only
手册原文
Maxwell combines the functionality of the L1 and texture caches into a single unit. As with Kepler, global loads in Maxwell are cached in L2 only, unless using the LDGread-only data cache mechanism introduced in Kepler. In a manner similar to Kepler GK110B, GM204 retains this behavior by default but alsoallows applications to opt-in to caching of global loads in its unified L1/Texture cache. The opt-in mechanism is the same as with GK110B: pass the -Xptxas -dlcm=ca flag tonvcc at compile time. Local loads also are cached in L2 only, which could increase the cost of register spillingif L1 local load hit rates were high with Kepler. The balance of occupancy versus spilling should therefore be reevaluated to ensure best performance. Especially given the improvements to arithmetic latencies, code built for Maxwell may benefit fromsomewhat lower occupancy (due to increased registers per thread) in exchange for lower spilling. The unified L1/texture cache acts as a coalescing buffer for memory accesses, gatheringup the data requested by the threads of a warp prior to delivery of that data to the warp.This function previously was served by the separate L1 cache in Fermi and Kepler.
Pascal
除了增大了l2cache大小之外,cache 架构也是继承与上一代 Maxwell 的
0 0
- Nvidia GPU 的存储架构 发展
- Nvidia GPU架构演变
- NVIDIA Fermi GPU架构简单解析
- GPU架构解析——NVIDIA\AMD
- GPU的发展历程
- NVIDIA/ATI命运转折 GPU十年发展回顾
- NVIDIA的PhysX GPU物理加速概述
- NVIDIA GPU的Compute Capability一览
- NVIDIA创始人黄仁勋:GPU+ARM将远胜X86架构
- NVIDIA下代Pascal GPU架构提升深度学习速度
- Nvidia GPU卡演进架构及(P100)介绍
- NVIDIA-GPU-CUDA
- NVIDIA GPU分类
- Nvidia GPU architecture笔试
- 转载:NVIDIA GPU结构
- NVIDIA GPU 2016
- CUDA on NVIDIA GPU
- 加速GPU,加速NVIDIA
- 修复添加system/app应用crash bug
- 数据库监听
- div滚动条在最低端
- SQL SERVER日常运维巡检系列之六——作业运行情况
- java相关基础知识整理复习(二)
- Nvidia GPU 的存储架构 发展
- winform 动态加载表以按钮形式显示机器状态
- ViewPager切换动画速度修改
- Cloudera Manager Java Version
- 基于链表的学生成绩管理系统
- 整理Runntime相关
- chart.js(2.4)笔记
- Codeforces Round #387 (Div. 2) A+B+C+D!
- 文件输入输出流