6.3~6.5.Cache Key Point

\(6.3 \sim 6.5\).Cache Key Point

1.Cache Read Process

   Suppose we have a direct-mapped cache described by \((S, E, B, m) = (4, 1, 2, 4)\), and we only read word(1 byte).

  • If block \(x \mod setNum\) equals block \(y \mod setNum\), then block x and y belong to the same set.

  • Since a block can contain two words, when reading the address \(x\)(suppose x is odd), both information in \(x\) and \(x+1\) will be read into the block.

  Let us simulate the cache in action as the CPU performs a sequence of reads:

  1. Initially, the cache is empty(i.e., each valid bit is 0):

  1. Read word at address 0: Since the valid bit for set 0 is 0, this is a cache miss. The cache fetches block 0 from memory and stores the block in set 0. Then the cache returns m[0] (the contents of memory location 0) from block[0] of the newly fetched cache line.

  1. Read word at address 1: This is a cache hit, The cache immediately returns m[1] from block[1] of the cache line. The state of the cache does not change.

  \(e.g.\)Consider the following transpose routine:

1
2
3
4
5
6
7
8
9
10
11
12
typedef int array[2][2];

void transpose1(array dst, array src)
{
int i, j;

for (i = 0; i < 2; i++) {
for (j = 0; j < 2; j++) {
dst[j][i] = src[i][j];
}
}
}

  Assume this code runs on a machine with the following properties:

  • sizeof(int) = 4.

  • The src array starts at address 0 and the dst array starts at address 16 (decimal).

  • There is a single L1 data cache that is direct-mapped, write-through, and write-allocate, with a block size of 8 bytes.

  • The cache has a total size of 16 data bytes and the cache is initially empty.

  • Accesses to the src and dst arrays are the only sources of read and write misses, respectively.

  For each row and column, indicate whether the access to src[row][col] and dst[row][col] is a hit(h) or a miss(m). For example, reading src[0][0] is a miss and writing dst[0][0] is also a miss:

  \(dst \; array\) | | Col. 0 | Col. 1 | |--------|--------|--------| | Row 0 | m | | | Row 1 | | |

  \(src \; array\) | | Col. 0 | Col. 1 | |--------|--------|--------| | Row 0 | m | | | Row 1 | | |

  \(solution:\) Because the cache is too small to hold both arrays, references to one array keep evicting useful lines from the other array:

  \(dst \; array\) | | Col. 0 | Col. 1 | |--------|--------|--------| | Row 0 | m | m | | Row 1 | m | m |

  \(src \; array\) | | Col. 0 | Col. 1 | |--------|--------|--------| | Row 0 | m | m | | Row 1 | m | h |

2.Conflict Misses in Direct-Mapped Caches

  Periodically misses in direct-mapped caches typically occur when programs access arrays whose sizes are a power of 2.

  Take the following code as an example:

1
2
3
4
5
6
7
8
9
10
float dotprod(float x[8], float y[8])
{
float sum = 0.0;
int i;

for (i = 0; i < 8; i++)
sum += x[i] * y[i];

return sum;
}

  Suppose that: * floats are 4 bytes.

  • x is loaded into the 32 bytes of contiguous memory starting at address 0, and that y starts immediately after x at address 32.

  • A block is 16 bytes (big enough to hold four floats)

  • the cache consists of two sets, for a total cache size of 32 bytes.

  Given these assumptions, each x[i] and y[i] will map to the identical cache set:

Element Address Set index Element Address Set index
x[0] 0 0 y[0] 32 0
x[1] 4 0 y[1] 36 0
x[2] 8 0 y[2] 40 0
x[3] 12 0 y[3] 44 0
x[4] 16 1 y[4] 48 1
x[5] 20 1 y[5] 52 1
x[6] 24 1 y[6] 56 1
x[7] 28 1 y[7] 60 1
  • The first iteration of the loop references x[0], the block containing x[0]–x[3] is loaded into set 0.

  • The next reference is to y[0], another miss that causes the block containing y[0]–y[3] to be copied into set 0, overwriting the values of x that were copied in by the previous reference.

  The term thrashing describes any situation where a cache is repeatedly loading and evicting the same sets of cache blocks.

  One easy solution to this problem is to put B bytes of padding at the end of each array. For example, instead of defining x to be float x[8], we define it to be float x[12]:

Element Address Set index Element Address Set index
x[0] 0 0 y[0] 48 1
x[1] 4 0 y[1] 52 1
x[2] 8 0 y[2] 56 1
x[3] 12 0 y[3] 60 1
x[4] 16 1 y[4] 64 0
x[5] 20 1 y[5] 68 0
x[6] 24 1 y[6] 72 0
x[7] 28 1 y[7] 76 0

  x[i] and y[i] now map to different sets