note What Every Programmer Should Know About Memory

来源:互联网 发布:精通javascript 图灵 编辑:程序博客网 时间:2024/05/16 10:27

Abstract

As CPU cores become both faster and more numerous, the limiting factor for most programs is now, and will be for some time, memory acces.

Hardware designers have come up with ever more sophisticated memory hadnling and acceleration techniques-such as CPU caches-but these

cannot work optimally without some help from the programmeer. Unfortunately, neither the structure nor the cost of using the memory subsystem of

a computer or the caches on CPUs is will understood by most programmers. This paper explains the structure of memory subsystems in use on modern

commodity hardware, illustrating why CPU caches were developed, how they work, and what programs should do to achieve optimal performance by utilizing them.

 

Introduction

Today these changes mainly come in the following forms:

RAM hardware design(speed and parallelism)

Memory controller designs

CPU caches.

Direct memory access(DMA) for devices.

For the most part, this document will deal with CPU caches and some effects of memory controller design.

This document is in no way all inclusive  and final. It is limited to commodity hardware and further limited to a subset of that hardware.

When it comes to operating-system-specific details and solutions, the text exclusively describes Linux.

One last cmment before the start. The text contains a number of occurrences of term "usually" and other, similar qualifiers.

 

Document Structure

This document is mostly for software developers.

To that end, the second setction describes random-access memory(RAM) in technical detail.

The third section goes into a lot of details of CPU cache behavior.

Section 5 goes into a lot of  detail about Non Uniform Memory Access(NUMA) systems.

Section 6 is the central section of the paper.It brings together all the previous sections' information and gives p

                  rogrammers advice on how to write code which performs  well in the various situations.

Section 7 introduces tools which can help the programmer do a better job.

In section 8 we finally give an outlook of technology which can be expected in the near  future or which might just simple be good to have.

 

Commodity Hardware Today

All CPUs are connected via a common bus(the Front Side Bus, FSB) to the Northbridge. The Northbridge contains, among other things, the memory controller,

and its implementation determines the type of RAM chips used for the computer.Different types of RAM, such as DRAM,Rambus, and SDRAM, require different

memory controllers.

To reach all other system devices, the Northbridge must communicate with the Southbridge. The Southbridge, often referred to as the I/O bridge, handles

communication with devices through a variety of different buses.

Such a system structure has a number of nottworthy consequences:

1、All data communication from on CPU to another must travel over the same bus used to communicate with the Northbridge.

2、All communication with RAM must pass through the Northbridge

3、The RAM has only a single port

4、Communication between a CPU and a device attached to the Southbridge is routed through the Northbridge

 

A couple of bottlenecks are immediately apparent in this design. One such bottleneck involves access to RAM for devices. In the earliest days of the PC,

all communication with devices on either bridge had to pass through the CPU, negatively impacting overall system performance. To work around this problem

some devices became capable of direct memory access(DMA). DMA allows devices, with the help of the Northbridge, to store and receive date in RAM directly

without the intervertion of the CPU.

 

A second bottleneck  involves the bus from the Northbridge to the RAM.