6.3~6.5.Cache Key Point
\(6.3 \sim 6.5\).Cache Key Point
1.Cache Read Process
Suppose we have a direct-mapped cache described by \((S, E, B, m) = (4, 1, 2, 4)\), and we only read word(1 byte).
If block \(x \mod setNum\) equals block \(y \mod setNum\), then block x and y belong to the same set.
Since a block can contain two words, when reading the address \(x\)(suppose x is odd), both information in \(x\) and \(x+1\) will be read into the block.
Let us simulate the cache in action as the CPU performs a sequence of reads:
- Initially, the cache is empty(i.e., each valid bit is 0):
- Read word at address 0: Since the valid bit for set 0 is 0, this is
a cache miss. The cache fetches block 0 from memory and
stores the block in set 0. Then the cache returns
m[0]
(the contents of memory location 0) fromblock[0]
of the newly fetched cache line.
- Read word at address 1: This is a cache hit, The cache immediately
returns
m[1]
fromblock[1]
of the cache line. The state of the cache does not change.
\(e.g.\)Consider the following transpose routine:
1 | typedef int array[2][2]; |
Assume this code runs on a machine with the following properties:
sizeof(int) = 4
.The
src
array starts at address 0 and thedst
array starts at address 16 (decimal).There is a single L1 data cache that is direct-mapped, write-through, and write-allocate, with a block size of 8 bytes.
The cache has a total size of 16 data bytes and the cache is initially empty.
Accesses to the
src
anddst
arrays are the only sources of read and write misses, respectively.
For each row and column, indicate whether the access to
src[row][col]
and dst[row][col]
is a hit(h) or
a miss(m). For example, reading src[0][0]
is a miss and
writing dst[0][0]
is also a miss:
\(dst \; array\) | | Col. 0 | Col. 1 | |--------|--------|--------| | Row 0 | m | | | Row 1 | | |
\(src \; array\) | | Col. 0 | Col. 1 | |--------|--------|--------| | Row 0 | m | | | Row 1 | | |
\(solution:\) Because the cache is too small to hold both arrays, references to one array keep evicting useful lines from the other array:
\(dst \; array\) | | Col. 0 | Col. 1 | |--------|--------|--------| | Row 0 | m | m | | Row 1 | m | m |
\(src \; array\) | | Col. 0 | Col. 1 | |--------|--------|--------| | Row 0 | m | m | | Row 1 | m | h |
2.Conflict Misses in Direct-Mapped Caches
Periodically misses in direct-mapped caches typically occur when programs access arrays whose sizes are a power of 2.
Take the following code as an example:
1 | float dotprod(float x[8], float y[8]) |
Suppose that: * floats are 4 bytes.
x
is loaded into the 32 bytes of contiguous memory starting at address 0, and thaty
starts immediately afterx
at address 32.A block is 16 bytes (big enough to hold four floats)
the cache consists of two sets, for a total cache size of 32 bytes.
Given these assumptions, each x[i]
and
y[i]
will map to the identical cache set:
Element | Address | Set index | Element | Address | Set index |
---|---|---|---|---|---|
x[0] | 0 | 0 | y[0] | 32 | 0 |
x[1] | 4 | 0 | y[1] | 36 | 0 |
x[2] | 8 | 0 | y[2] | 40 | 0 |
x[3] | 12 | 0 | y[3] | 44 | 0 |
x[4] | 16 | 1 | y[4] | 48 | 1 |
x[5] | 20 | 1 | y[5] | 52 | 1 |
x[6] | 24 | 1 | y[6] | 56 | 1 |
x[7] | 28 | 1 | y[7] | 60 | 1 |
The first iteration of the loop references
x[0]
, the block containingx[0]–x[3]
is loaded into set 0.The next reference is to
y[0]
, another miss that causes the block containingy[0]–y[3]
to be copied into set 0, overwriting the values of x that were copied in by the previous reference.
The term thrashing describes any situation where a cache is repeatedly loading and evicting the same sets of cache blocks.
One easy solution to this problem is to put B bytes of
padding at the end of each array. For example, instead of
defining x
to be float x[8]
, we define
it to be float x[12]
:
Element | Address | Set index | Element | Address | Set index |
---|---|---|---|---|---|
x[0] | 0 | 0 | y[0] | 48 | 1 |
x[1] | 4 | 0 | y[1] | 52 | 1 |
x[2] | 8 | 0 | y[2] | 56 | 1 |
x[3] | 12 | 0 | y[3] | 60 | 1 |
x[4] | 16 | 1 | y[4] | 64 | 0 |
x[5] | 20 | 1 | y[5] | 68 | 0 |
x[6] | 24 | 1 | y[6] | 72 | 0 |
x[7] | 28 | 1 | y[7] | 76 | 0 |
x[i]
and y[i]
now map to different
sets