内存 cache

来源:互联网 发布:java带薪实训靠谱吗 编辑:程序博客网 时间:2024/06/08 08:09


Introduction to Caches 

cache slot 的数据结构

Slots

Let's get into more specific details about the cache. In particular, we'll describe an individual slot.

slot consists of the following:

  • V the valid bit, indicating whether the slot holds valid data. If V = 1, then the data is valid. If V = 0, the data is not valid. Initially, it's invalid. Once data is placed into the slot, it's valid.

  • D the dirty bit. This bit only has meaning if V = 1. This indicates that the data in the slot (to be discussed momentarily) has been modified (written to) or not. If D = 1, the data has been modified since being in the cache. If D = 0, then the data is the same as it was when it first entered the cache.

  • Tag The tag represents the upper bits of the address. The size of the tag is 32 - lg N where N is the number of bytes in the data part of the slot.

  • Cache Line This is the actual data itself. There are N bytes, where N is a power of 2. We will also call this the data block.

The total amount of memory used in a slot that stores 32 bytes in a cache line can be easily computed. There's 2 bits (one for V and D), plus the tag (27 bits), plus the data in the cache line (32 x 1 byte = 32 bytes = 256 bits). That's a total of 285 bits.

Here's a diagram of one slot.

There's variation to the slot, which we'll discuss when we talk about direct mapped cache, and set associative cache.


create a new cache slot

When you choose a slot to be evicted, you look at the dirty bit (we assume a write-back policy). If D = 1, i.e., the slot is dirty, then copy the cache line back into memory. Otherwise, you can overwrite the cache line with the new data, tag, etc.

When the new cache line enters,

  • V is set to 1, (it's valid)
  • D is set to 0, (no longer dirty)
  • Tag bits are copied in
  • Cache line is copied in from main memory


Fully Associative Cache

内存可以cache到任何一个slot

Fully Associative Scheme

Suppose we are trying to access a byte at address A31-0. We know that the address can be split into two parts: A31-5, which is the tag, and A4-0, which is the offset.

We generate 32 addresses by keeping the upper 27 bits, and making all 32 5-bit bitstrings for B4-0.

Which slot should the cache line go to, assuming the data is not already in the cache? In a fully associative scheme, you can pick any slot. However, there are some intelligent choices you can make.

  • If there is any slot where the valid bit is 0 V = 0, pick that slot.
  • If all the slots are valid, you need to pick a slot to evict. In the introduction to cache notes, there afgre a list of possible eviction policies. A slot is chosen to be evicted, and the new cache line is placed there, with the valid and dirty bits updated, as well as the tag.

Finding the Slot

How would you determine whether the cache line you are looking for is in the cache? Assume we're trying to load or store at address B31-0.

Since the cache line can be in any slot, we will have to look at every slot. Hardware has one advantage over software when it comes to searching the slots. You can do searches in parallel, instead of examining each slot one at a time.

To do the search, first get the tag bits out of the address: these are bits B31-5. You must simultaneously compare the address tag bits (which is B31-5) to the tag of each slot. This can be done using a comparator, which can simply be a bunch of XNOR gates (XNOR2 is true when two bits have the same value), which compare the address tag to the slot tag.

You must also determine whether the slot's valid bit is 1. For a match to occur, the address tag must match the slot tag, and V = 1.

At most, one slot should match. If there is more than one slot that matches, then you have a faulty fully-associative cache scheme. You should never have more than one copy of the cache line in any slot of a fully-associative cache. It's hard to maintain multiple copies, and doesn't make sense. The slots could be used for other cache lines.

The hardware for finding the right slot, then picking the slot if more than one choice is available is rather large, so fully associative caches are not used in practice. The complexity of the fully associative hardware actually slows down the overall speed of the cache.


Summary -- 只用于TLB

In a fully associative scheme, any slot can store the cache line. The hardware for finding whether the desired data is in the cache requires comparing the tag bits of the address to the tag bits of every slot (in parallel), and making sure the valid bit is set.

If there is a cache miss, the initial goal is to pick a slot that's not valid to place the data being searched for. If there are no valid slots, the hardware must pick a slot to evict based on some eviction policy. If the evicted slot's dirty bit is 1, then the data in the slot must be copied back to RAM. If it's 0, there's no need to copy back, since the data in the cache line should be identical to the data in the corresponding memory locations in RAM.

Then, the new data block must be copied from RAM to the slot, and V set to 1, D set to 0, and the slot tag copied from the address tag.

The hardware for a fully-associative cache can be rather complex, which is why you don't see fully-associative caches (except for translation lookaside buffers).


Direct Mapped Cache

内存只能cache到固定的唯一一个slot

Direct Mapped Scheme

In a direct mapped cache, we treat the 128 slots as if they were an array of slots. We can index into the array using binary numbers. For 128 slots, you need 7 bits. Thus, the index of the array is from 0000000two up to 1111111two.

How do we decide which slot the cache line associated with address A31-0 should go into?

Here's how we do it:

Since we have 128 slots, we need to specify which one slot we need the cache line to go in. This requires lg 128 = 7 bits. Where do we get the bits from? Directly from the address itself.

Bits A4-0 is still the offset. The slot number are the next 7 bits, Bits A11-5. The remaining bits, A31-12 is the tag.

Finding the Slot

Finding a slot is now easy. Suppose you have address B31-0.
  • Use bits B11-5 to find the slot.
  • See if bits B31-12 match the tag of the slot.
  • If so, get the byte at offset B4-0.
  • If not, fetch the 32 bytes from memory, and place it in the slot, updating valid bit, dirty bit, and tag as neededx

Drawbacks -- 容易冲突

The drawback of this scheme is obvious. Effectively, we have a hash table with a very simple hash function (use bits B11-5). This can cause collisions at that particular slot.

Unlike a hash table, we do not resolve such collisions by finding the next free slot. Instead, we overwrite the value in the slot.

Just think of the parking lot analogy. A direct mapped scheme works poorly if many students have permits for the same spot, while other spots have very few permits.

The data you have might simply map to the same slot, and you could have cache lines going in and out all the time.

Advantages -- 简单

If there's an advantage to the scheme, it's that it's very simple. You don't have to simulataneously match tags with all slots. You just have one slot to check.

If the data isn't in the slot, it's obvious that this is the slot to get evicted.


Set Associative Cache

混合cache, 一个内存位置,对应着 1/n  (n = 2^k) 的cache slot 

Introduction

A set-associative scheme is a hybrid between a fully associative cache, and direct mapped cache. It's considered a reasonable compromise between the complex hardware needed for fully associative caches (which requires parallel searches of all slots), and the simplistic direct-mapped scheme, which may cause collisions of addresses to the same slot (similar to collisions in a hash table).

Let's assume, as we did for fully associate caches that we have:

  • 128 slots
  • 32 bytes per slot

Furthermore, let's assume that we can group slots together into sets. In particular, we will assume that we have 8 slots per set.

Parking Lot Analogy

Suppose we have 1000 parking spots. This time, instead of using a 3 digit number for each parking spot, we use 2 digits. Thus, the parking spots are numbered 00 up to 99.

However, instead of one parking spot per number, we have 10 for each number. Thus, there are ten parking spots numbered 00, ten numbered 01, ..., and ten numbered 99.

Your parking spot is based on the first 2 digits of your student ID number.

In this case, you use the first 2 digits of your student ID, and have up to 10 different parking spots you can park at. This gives you some flexibility about where to park.

In effect, the various parking permits on a large commuter campus work just like that. There are many lots, each with their own letter or number. You are given a permit for a particular lot, but you can park anywhere within this lot. The advantage is that you only have to search for a spot in one large lot, as opposed to searching for a parking spot in all of campus.

Set Associative Scheme

Like the direct mapped scheme, we still treat the slots like an array. The slots are still numbered 0000000 up to 1111111 (there are 128 slots).

However, we group the slots into sets, and the key is to keep track of the sets, instead of the slots.

How many sets do we have? 128 slots divided by 8 slots per sets, gives us 16 sets.

We need to specify the set number, instead of the slot number, and that takes lg 16 = 4 bits.

Here's how the bits of the address break down. It's very similar to direct mapped, except we use 4 bits for the set, instead of the slot.

Bits A4-0 is still the offset. The set number are the next 4 bits, Bits A8-5. The remaining bits, A31-9 is the tag.

Finding the Slot

Finding a slot is more complex than in direct-mapped caches. Suppose you have address B31-0.
  • Use bits B8-5 to find the set.
  • This should specify 8 slots (since we said there were 8 slots per set. The slots should have following slot indexes:
    • B8-5000
    • B8-5001
    • B8-5010
    • B8-5011
    • B8-5100
    • B8-5101
    • B8-5110
    • B8-5111
    In effect, the set number specifies the upper 4 bits of the index, and the bottom 3 bits are all possible 3 bit bitstring values.
  • Search in all 8 slots to see if the tag A31-9 matches the tag in the slot.
  • If it matches one of the slots, get the byte at offset B4-0.
  • If not, decide which slot should be used (possibly evicting a slot), fetch the 32 bytes from memory, slot, updating valid bit, dirty bit, and tag as neededx
This is called 8-way set associative cache, since each set contains 8 slots. You can have N-way set-associative caches, where each set contains N slots (where N is a power of 2).

Compromises

This scheme is a compromise. You only have to use the complex comparison hardware (to find the correct slot) on a small set of slots, instead of over all the slots. Presumably, such comparison hardware is more than linear in the number of slots, so the fewer the slots you need to search through, the less overall hardware is needed.

Yet, you gain the flexibility of allowing up to N cache lines per slot for an N-way set associative scheme.

Summary

set-associative cache scheme is a combination of fully associative and direct mapped schemes. You group slots into sets. You find the appropriate set for a given address (which is like the direct mapped scheme), and within the set you find the appropriate slot (which is like the fully associative scheme).

This scheme has fewer collisions because you have more slots to pick from, even when cache lines map to the same set.