近期（5月2日到6月15日）PCM相关文章一览

来源：互联网发布：r软件的使用编辑：程序博客网时间：2024/05/16 01:13

[PDF] CLOCK-DWF: A Write-History-Aware Page Replacement Algorithm for Hybrid PCM and DRAMMemory Architectures

Abstract:

Phase change memory has emerged as one of the most promising technologies to incorporate into the memory hierarchy of future computer systems. However, PCM has two critical weaknesses to substitute DRAM memory in its entirety. First, the number of write operations allowed to each PCM cell is limited. Second, write access time of PCM is about 6-10 times slower than that of DRAM. To cope with this situation, hybrid memory architectures that use a small amount of DRAM together with PCM have been suggested. This paper presents a new memory management technique for hybrid PCM and DRAM memory architecture that efficiently hides the slow write performance of PCM. Specifically, we aim to estimate future write references accurately and then absorb frequent memory writes into DRAM. To do this, we analyze the characteristics of memory write references and find two noticeable phenomena. First, using write history alone performs better than using both read and write history in estimating future write references. Second, the frequency characteristic is a better estimator than temporal locality in predicting future memory writes. Based on these two observations, we present a new page replacement algorithm called CLOCK-DWF that significantly reduces the number of writes that occur on PCM and also increases the lifespan of PCM memory.

出版信息：IEEE Transactions on Computers, (Volume:PP , Issue: 99 )

创新点：PCM作为未来最有希望的存储技术将被结合到未来计算机系统的存储层次中，然而PCM相比DRAM有两个主要缺陷，一个是每一个PCM单元能承受的写操作数量是有限的，第二个是PCM的写访问速度要比DRAM 慢6到10倍。为了解决这个问题，人们提出了一种混合的存储体系结构，使用一小部分DRAM和PCM在系统中。

这篇文章为这种混合PCM和DRAM的存储系统提出了一个新的内存管理技术，有效地隐藏PCM的慢速写的缺陷，并且发现了两个很有意思的现象。一个是单独使用写历史要比结合使用读写历史来估计未来写访问的做法好些。另一个是频率特性要比临时局部性在预计未来内存写访问时更好。在上面的两个发现的基础上，我们提出了一个新的页替换算法叫做CLOCK-DWF，它可以很大程度减少在PCM上发生的写操作，并且延长PCM寿命。

[PDF] Curling-PCM: Application-Specific Wear Leveling for Phase Change Memory based Embedded Systems

Abstract:

Phase change memory (PCM) has been used as NOR flash replacement in embedded systems with its attractive features. However, the endurance of PCM keeps drifting down and greatly limits its adoption in embedded systems. As most embedded systems are application-oriented, we can better utilize PCM by exploring application-specific features such as fixed access patterns and update frequencies to prolong the lifetime of PCM. In this paper, we propose an application-specific wearleveling technique, called Curling-PCM, to evenly distribute write activities across the PCM chip in order to improve the endurance of PCM. The basic idea is to exploit application-specific features in embeddedsystems and periodically move the hot region across the whole PCM chip. To further reduce the overhead of moving the hot region and improve the performance of PCM-based embedded systems, a fine-grained partial wear leveling policy is proposed in Curling-PCM, by which only part of the hot region is moved during each request handling period. The experimental results show that Curling-PCM can effectively evenly distribute write traffic in PCM chips compared with previous work. We expect this work can serve as a first step towards the full exploration of application-specific features in PCM-basedembedded systems.

出版信息：
2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC),

创新点：负载平衡算法

在嵌入式系统中开发面向应用特点的负载平衡算法，周期性移动热点。为了进一步减小移动热点的开销并且提高基于PCM的嵌入式系统的性能，文章提出一个细粒度的部分负载平衡策略。

A SELECTIVE READ-BEFORE-WRITE SCHEME FOR ENERGY-AWARE SPIN TORQUE TRANSFER RAM (STT-RAM) CACHE DESIGN

Abstract:

Due to its low leakage power and high density, spin torque transfer RAM (STT-RAM) has become a good candidate for future on-chip cache. However, STT-RAM suffers from higher write energy compared to the SRAM. One state-of-the-art technique to alleviate this problem is read-before-write (RBW). In this paper, we study the pattern of the write accesses to the L2 cache and show that directly applying the RBW to a STT-RAM L2 cache can be problematic from energy perspective. We then propose a selective read-before-write (SRW) scheme to further reduce the dynamic write energy of the STT-RAM cache. Additional optimizations are included in the design of SRW so that it can save a considerable amount of energy at negligible overheads. The experimental results show that SRW achieves a 86.0% reduction in write energy consumption vs. a baseline without any write optimization techniques, and a 6.55% more reduction compared to the RBW scheme.

Read More: http://www.worldscientific.com/doi/abs/10.1142/S0218126613500382

出版信息：Journal of Circuits, Systems and Computers Volume 22, Issue 05, June 2013

[PDF] Investigation of the emerging physical mechanisms limiting the reliability of nanoscale Flashmemories

意大利一个学校博士论文，硬件相关

其余略

Hardware-Assisted Cooperative Integration of Wear-Leveling and Salvaging for Phase ChangeMemory

L Jiang, Y Du, B Zhao, Y Zhang, BR Childers… - ACM Transactions on …, 2013 - dl.acm.org

Abstract Phase Change Memory (PCM) has recently emerged as a promising memory
technology. However, PCM's limited write endurance restricts its immediate use as a
replacement for DRAM. To extend the lifetime of PCM chips, wear-leveling and salvaging ...

出版信息：May 2013

Transactions on Architecture and Code Optimization (TACO) , Volume 10 Issue 2

Bridging the programming gap between persistent and volatile memory using WrAP

abstract:

Advances in memory technology are promising the availability of byte-addressable persistent memory as an integral component of future computing platforms. This change has significant implications for software that has traditionally made a sharp distinction between durable and volatile storage. In this paper we describe a software-hardware architecture, WrAP, for persistent memory that provides atomicity and durability while simultaneously ensuring that fast paths through the cache, DRAM, and persistent memory layers are not slowed down by burdensome buffering or double-copying requirements. Trace-driven simulation of transactional data structures indicate the potential for significant performance gains using the WrAP approach.

出版信息：

CF '13 Proceedings of the ACM International Conference on Computing FrontiersArticle No. 30

这篇文章写得很有意思，讲了一个比较完整的故事，这里列个梗概。

在最近出现的数据管理系统中，为了加速系统，普遍采用的方法

1. 要么是采用DRAM服务器。这个的例子是memcached（逻辑上合并多个节点的DRAM内存使得单个节点可用内存增多）

2. 要么就是采用主存数据库技术使得几乎完全在内存上操作数据。这个的例子是IBM的solidDB（内存中多个拷贝互相同步）

但是由于DRAM的数据掉电丢失特性。如果要保证当系统crash的时候数据不丢，就需要引入更多的开销来通过在非易失存储介质上维持数据的拷贝解决。

并且，在一个崩溃或者在一个计划中的数据系统维护之后，需要很长的恢复时间，这是因为需要从非易失的存储介质中重建内存中的数据。

然后，扯了一段说SCM是大家的希望，星星之火。

而上面提到的1,2两个技术要保证persistence必须要引入检查点或者是日志技术，最新的一个解决方案ramcloud是采用在不同服务器内存中保存多份数据拷贝，

而这样的能源消耗较大。

overview
为了正确实现persistence，软件必须解决三个问题
1. persistence ordering，对于persistent数据结构必须在申明上添加一些约束，这是因为
随时都有可能发生错误，举个例子，如果将一个persistence的指针变量指向一个未初始化的
存储块地址可能造成无法检测的错误，（如果系统在块初始化和指针更新之间发生崩溃了）
然而，交换更新的顺序，从而使块在指针之前初始化，就可以保证即使在发生错误并重启之
后数据的一致性。
需要注意的是要保证persistent ordering要求所有的更新必须按照定制的
顺序执行到persistent memory上。而仅仅像在典型的内存一致性协议一样，实现这些update的
全局可见性是不够的。必须有额外的硬件支持，这部分在section 2.1讨论。

2. persistence atomicity 事务化语义要求对于一组相关记录的更新必须表现为一组完整操作，
意思就是要么所有的记录都被更新了，要么都没有被更新。
由于错误随时可能发生，那么系统必须通过某种方式备份a partial set of updates，或者推迟更新，
直到所有的值都被记录到掉电安全的存储区域。传统软件系统通过系统调用底层文件系统或者
数据库来执行事务化更新，并使用基于磁盘的记录日志，或者给予copy on write的机制来保证更新
的执行不被分割，并且在事务提交之后始终保持可恢复性。

3. persistence protection 在persistence内存系统中，编程bug是很难发现的。
首先由于变化的persistent特性使得仅仅通过重启而到达一个一致的内存状态是不可能的。而且
数据结构之间的指针以来关系在易失和非易失内存之间传播，使得保证编程鲁棒性受到很大挑战。

下面看一下之前的方法是怎么解决以上三个问题的。
在BPFS中，为了保证更新顺序提出了一种叫做epoch barriers的新机制，一个cache line被标识
一个epoch号，并且修改了cache硬件来保证内存写回顺序总是按照epoch号顺序进行。

在mnemosyne中，对persistence的写顺序是由
1. 一些不做缓冲的写模式（non-cached write modes）
2. cache line flush 操作
3. 内存栅栏memory barriers(fence instructions)
然而fence instructions仅仅保证了全局的可见性，需要被加强，从而才能确保fence指令的完成预示着
挂起的写也被committed到persistent memory.

一个叫做cache line counters的轻量级硬件机制可以允许软件查询在一个特定集合中的所有的写都被
commit到内存，并且延迟与这个写有依赖关系的操作。

一个软件原语flush与内存fence指令协同可以允许软件对更新操作进行排序。

在【22,7】中提出了基于排序原语的persistent日志实现。

本文中，保守地使用排序原语（ordering primitives）只保证，一个事务中的更新轨迹将在事务commit之前
被日志记录到一个掉电安全的persistent memory区域。

在【22,7】中提出的日志结构在这里被简化使用了。

Memristors for neural branch prediction: a case study in strict latency and write endurance challenges

出版信息：Saadeldeen H, Franklin D, Long G, et al. Memristors for neural branch prediction: a case study in strict latency and write endurance challenges[C]//Conf. Computing Frontiers. 2013: 26.

Abstract：

Memristors offer many potential advantages over more traditional memory-cell technologies, including the potential for extreme densities, and fast read times. Current devices, however, are plagued by problems of yield, and durability. We present a limit study of an aggressive neural network application that has a high update rate and a strict latency requirement, analog neural branch predictor. Of course, traditional analog neural network (ANN) implementations of branch predictors are not built with the idea that the underlying bits are likely to fail due to both manufacturing and wear-out issues. Without some careful precautions, a direct one-to-one replacement will result in poor behavior.

We propose a hybrid system that uses SRAM front-end cache, and a distributed-sum scheme to overcome memristors' limitations. Our design can leverage devices with even modest durability (surviving only hours of continuous switching) to provide a system lasting 5 or more years of continuous operation. In addition, these schemes allow for a fault-tolerant design as well. We find that, while a neural predictor benefits from larger density, current technology parameters do not allow high dense, energy-efficient design. Thus, we discuss a range of plausible memristor characteristics that would; as the technology advances; make them practical for our application.

创新点：总的来说，是为了硬件做的一个折中方案。

memristors存储密度高，读速度快，但是现在的memristors设备一个是产量小，持久性差

[PDF] AC-DIMM: Associative Computing with STT-MRAM

出版信息：

TitleISCA '13 Proceedings of the 40th Annual International Symposium on Computer Architecture

Abstract:

With technology scaling, on-chip power dissipation and off-chip memory bandwidth have become significant performance bottlenecks in virtually all computer systems, from mobile devices to supercomputers. An effective way of improving performance in the face of bandwidth and power limitations is to rely on associative memory systems. Recent work on a PCM-based, associative TCAM accelerator shows that associative search capability can reduce both off-chip bandwidth demand and overall system energy. Unfortunately, previously proposed resistive TCAM accelerators have limited flexibility: only a restricted (albeit important) class of applications can benefit from a TCAM accelerator, and the implementation is confined to resistive memory technologies with a high dynamic range (R_High/R_Low), such as PCM.

This work proposes AC-DIMM, a flexible, high-performance associative compute engine built on a DDR3-compatible memory module. AC-DIMM addresses the limited flexibility of previous resistive TCAM accelerators by combining two powerful capabilities---associative search and processing in memory. Generality is improved by augmenting a TCAM system with a set of integrated, user programmable microcontrollers that operate directly on search results, and by architecting the system such that key-value pairs can be co-located in the same TCAM row. A new, bit-serial TCAM array is proposed, which enables the system to be implemented using STT-MRAM. AC-DIMM achieves a 4.2X speedup and a 6.5X energy reduction over a conventional RAM-based system on a set of 13 evaluated applications.

DATA PROTECTION FROM WRITE FAILURES IN NONVOLATILE MEMORY

METHOD AND SYSTEM FOR ERROR MANAGEMENT IN A MEMORY DEVICE

[PDF] Data Similarity-aware Computation Infrastructure for the Cloud

[PDF] Techniques for Data Mapping and Buffering to Exploit Asymmetry in Multi-Level Cell (Phase Change) Memory

Understanding the trade-offs in multi-level cell ReRAM memory design

[PDF] AN INTEgRATED SIMulATIoN INfRASTRuCTuRE foR THE ENTIRE MEMoRy HIERARCHy: CACHE, DRAM, NoNVolATIlE MEMoRy, AND DISk

[PDF] A CAsE For NoNuNIForM FAulT TolErANCE IN EMErGING MEMorIEs

MK Qureshi - Publisher Managing Editor Content Architect, 2013

[PDF] Characterizing the Impact of Process Variation on Write Endurance Enhancing Techniques for Non-Volatile Memory Systems

M Cintra, N Linkewitsch - 2013

... In practice, for non-volatile memory technologies such as PCM, STT-MRAM, and ReRAM, only
endurance in terms of number of write operations are of concern [9]. While some memory
technologies, such as DRAM, have practically unlimited write endurance, these non-volatile ...

[PDF] High-Throughput Low-Latency Fine-Grained Disk Logging

DN Simha, T Chiueh, G Karuppur, P Bose - 2013

... expensive flash based devices. Phase Change Memory(PCM) [16] is a faster alternative
to flash based disks but because of its smaller density and higher cost, it's not easy
to be adopted in near future. Mo- han et al. [9] propose to ...

PROGRAMMING OF PHASE-CHANGE MEMORY CELLS

A Pantazi, N Papandreou, C Pozidis, A Sebastian - US Patent 20,130,135,924, 2013

... Phase change memory (PCM) is a non-volatile solid-state memory technology that
exploits the reversible, thermally-assisted switching of specific chalcogenide compounds,
such as GST, between states of different electrical conductivity. ...

Memorage: emerging persistent RAM based malleable main memory and storage architecture

JY Jung, S Cho - Proceedings of the 27th international ACM conference …, 2013

... ISSCC, pages 46--48, 2012. 15. GF Close et al. A 512mb phase-change memory (pcm) in 90nm
cmos achieving 2b/cell. VLSIC, pages 202--203, 2011. 16. J. Condit et al. Better I/O through
byte-addressable, persistent memory. SOSP, pages 133--146, Oct. 2009. ...

[PDF] HMMSched: Hybrid Main Memory-Aware Task Scheduling on Multicore Systems

W Hwang, KH Park - FUTURE COMPUTING 2013, The Fifth International …, 2013

Page 1. HMMSched: Hybrid Main Memory-Aware Task Scheduling on Multicore Systems ...
Abstract—The strong demand for larger memory capacity with high energy efficiency creates
the need for a hybrid main memory of DRAM and NVRAM (Non-Volatile RAM). ...

Methods of Combinatorial Processing for Screening Multiple Samples on a Semiconductor Substrate

G Verma, TP Chiang, I Hashim, SG Malhotra… - US Patent 20,130,138,380, 2013

... 10. The method of claim 1, wherein the insulator comprises a first metal oxide, the
first metal oxide comprising one or more of Hf, Al, Ta, Nb, Zr, or Y, and wherein the
memory element is a phase change memory (PCM) element. 11. ...

SYSTEMS AND METHODS FOR IMPROVED COMMUNICATIONS IN A NONVOLATILE MEMORYSYSTEM

NC Seroff, A Fai, NJ Wakrat - US Patent 20,130,138,868, 2013

... charge trapping technology, NOR flash memory, erasable programmable read only memory
(“EPROM”), electrically erasable programmable read only memory (“EEPROM”), ferroelectric
RAM (“FRAM”), magnetoresistive RAM (“MRAM”), phase change memory (“PCM”), or any ...

Spin-transfer torque magnetic random access memory (STT-MRAM)

D Apalkov, A Khvalkovskiy, S Watts, V Nikitin, X Tang… - ACM Journal on Emerging …, 2013

Page 1. 13 Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM) ... 1.
INTRODUCTION For many decades, existing memory technologies have been successfully scaled
down and improved to achieve higher densities, faster speeds with lower production costs. ...

METHOD AND APPARATUS FOR DISTRIBUTED DIRECT MEMORY ACCESS FOR SYSTEMS ON CHIP

K Ganapathy, R Kanapathippillai, S Shah, G Moussa… - US Patent 20,130,138,877, 2013

... 207 includes a receive FIFO buffer 502, a transmit FIFO buffer 504, a channel register 505, a
data counter 506, a status/control register 507, control logic 508, and a TDM remapper memory
510. ... The voice data in its non-compressed form is PCM or pulse-code modulated data. ...

HARDWARE FILTER FOR TRACKING BLOCK PRESENCE IN LARGE CACHES

GH Loh, MD Hill - US Patent 20,130,138,894, 2013

... 4. The computing system as recited in claim 3, wherein the cache on the second chip utilizes
at least one of the following memory configurations: a dynamic random access memory (DRAM),
a phase-change memory (PCM), an array of memristors (RRAM), and a spin-torque ...

Digest for Localization or Fingerprinted Overlay

RT Pack - US Patent 20,130,138,337, 2013

... implementation, the controller 400 can include, but is not limited to, a processing device 410
and a memory device 420 (eg, non-transitory memory, flash memory, random access memory
(RAM), dynamic random access memory (DRAM), phase change memory (PCM), and/or ...

On-chip caches built on multilevel spin-transfer torque RAM cells and its optimizations

Y Chen, WF Wong, H Li, CK Koh, Y Zhang, W Wen - ACM Journal on Emerging …, 2013

... Read-Before-Write schemes have already been used in some nonvolatile memory designs,
for example, in SLC phase change memory (PCM) [Lee et al. 2007], and the toggling MRAM
[Durlam et al. 2003] designs, for write-energy reduction. .

Study on the impact of the initialization process on the phase change memory

K Ren, F Rao, Z Song, S Lv, M Zhu, L Wu, B Liu… - Applied Physics Letters, 2013

... next generation universal mem- ory due to its excellent logic compatibility and scaling— favorable
operation schemes.2 As a memory, high validity of ... Short reset pulse can reduce the duration of
melting state of the phase change material (PCM), short- ening the time for element ...

Towards greener data centers with storage class memory

IH Doh, YJ Kim, E Kim, J Cho, D Lee, SH Noh - Future Generation Computer Systems, 2013

... For now, it seems as if Phase-Change Memory (PCM), backed by major semiconductor companies
such as Intel and Samsung, is winning the battle. Whether it will win the war is still unknown as new
advancements in other technologies are also being announced [41]. ...

Technology Challenges and Opportunities for Ubiquitous Computing

S Borkar

... High capacity system memory consisting of DRAM and NAND/PCM consume higher energy
because (1) historically the memory architecture was optimized to reduce pin count, which wastes
energy, and (2) higher energy is incurred in IO signaling to provide the necessary ...

Skinflint DRAM System: Minimizing DRAM Chip Writes for Low Power

Y Lee, S Kim, S Hong, J Lee

... Recent study has exploited a similar phenomenon on the emerging memory technologies such
as spin torque transfer memory (STT-RAM) and phase change memory (PCM) to avoid long
latency and significant power consumption of write operations of STT-RAM [29] and PCM ...

UTILITY-BASED MODEL FOR CACHING PROGRAMS IN A CONTENT DELIVERY NETWORK

Y Qian, X Yao, Z Jin, JJ Hao - US Patent 20,130,145,001, 2013

... random access memory (RAM), dynamic random access memory (DRAM), cache, read only
memory (ROM), a programmable read only memory (PROM), a static random access memory
(SRAM), a single in-line memory module (SIMM), a phase-change memory (PCM), a dual ...

Compiler directed write-mode selection for high performance low power volatile PCM

Q Li, L Jiang, Y Zhang, Y He, CJ Xue - Proceedings of the 14th ACM SIGPLAN/ …, 2013

... Architecting emerging Phase Change Memory (PCM) is a promising approach for MCUs
due to its fast read speed and long write endurance. ... Enhancing lifetime and security of
pcm-based main memory with start-gap wear leveling. In MICRO, 2009. ...

FTL 2: a hybrid f lash t ranslation l ayer with logging for write reduction in flash memory

T Wang, D Liu, Y Wang, Z Shao - Proceedings of the 14th ACM SIGPLAN/SIGBED …, 2013

... 24. D. Liu, T. Wang, Y. Wang, Z. Qin, and Z. Shao. PCM-FTL: a writeactivity- aware NAND flash
memory management scheme for PCMbased embedded systems. In Proceedings of 2011 IEEE
32nd Real-Time Systems Symposium (RTSS ?11), pages 357--366, Dec. 2011. ...

BLog: block-level log-block management for NAND flash memorystorage systems

Y Guan, G Wang, Y Wang, R Chen, Z Shao - Proceedings of the 14th ACM SIGPLAN/ …, 2013

... In CODES+ISSS '11, pages 325--334, New York, NY, USA, 2011. ACM. 30. P. Zhou, Y. Du, Y.
Zhang, and J. Yang. Fine-grained QoS scheduling for pcm-based main memory systems. In IPDPS
'2010, pages 1--12, apr. 2010. 31. P. Zhou, B. Zhao, J. Yang, and Y. Zhang. ...

Method, apparatus and system for determining an identifier of a volume of memory

S Qawami, RW Faber - US Patent 8,463,948, 2013

... GeSbTe), or GST. As with conventional PCM devices, memory 130 may, in an
embodiment, store data to PCM cells by variously switching their respective
chalcogenide glass elements between crystalline and amorphous states. ...

Impact of Persistent Random Access Memories on Software Systems

A Badam

... how applications per- ceive storage. Persistent random access memory (PRAM)
technologies like phase change memory (PCM) promise to create a similar revolution
for the memory sub-system. PRAM promises to be byte-addressable ...