6.3~6.5.Cache Key Point

发表于 2024-08-11 分类于 CSAPP ， Chapter 6.The Memory Hierarchy 阅读次数：本文字数： 873 阅读时长 ≈ 3 分钟

\(6.3 \sim 6.5\).Cache Key Point

1.Cache Read Process

Suppose we have a direct-mapped cache described by \((S, E, B, m) = (4, 1, 2, 4)\), and we only read word(1 byte).

If block \(x \mod setNum\) equals block \(y \mod setNum\), then block x and y belong to the same set.
Since a block can contain two words, when reading the address \(x\)(suppose x is odd), both information in \(x\) and \(x+1\) will be read into the block.

Let us simulate the cache in action as the CPU performs a sequence of reads:

Initially, the cache is empty(i.e., each valid bit is 0):

Read word at address 0: Since the valid bit for set 0 is 0, this is a cache miss. The cache fetches block 0 from memory and stores the block in set 0. Then the cache returns m[0] (the contents of memory location 0) from block[0] of the newly fetched cache line.

Read word at address 1: This is a cache hit, The cache immediately returns m[1] from block[1] of the cache line. The state of the cache does not change.

\(e.g.\)Consider the following transpose routine:

typedef int array[2][2];

void transpose1(array dst, array src)
{
    int i, j;

    for (i = 0; i < 2; i++) {
        for (j = 0; j < 2; j++) {
            dst[j][i] = src[i][j];
        }
    }
}

Assume this code runs on a machine with the following properties:

sizeof(int) = 4.
The src array starts at address 0 and the dst array starts at address 16 (decimal).
There is a single L1 data cache that is direct-mapped, write-through, and write-allocate, with a block size of 8 bytes.
The cache has a total size of 16 data bytes and the cache is initially empty.
Accesses to the src and dst arrays are the only sources of read and write misses, respectively.

For each row and column, indicate whether the access to src[row][col] and dst[row][col] is a hit(h) or a miss(m). For example, reading src[0][0] is a miss and writing dst[0][0] is also a miss:

\(dst \; array\) | | Col. 0 | Col. 1 | |--------|--------|--------| | Row 0 | m | | | Row 1 | | |

\(src \; array\) | | Col. 0 | Col. 1 | |--------|--------|--------| | Row 0 | m | | | Row 1 | | |

\(solution:\) Because the cache is too small to hold both arrays, references to one array keep evicting useful lines from the other array:

\(dst \; array\) | | Col. 0 | Col. 1 | |--------|--------|--------| | Row 0 | m | m | | Row 1 | m | m |

\(src \; array\) | | Col. 0 | Col. 1 | |--------|--------|--------| | Row 0 | m | m | | Row 1 | m | h |

2.Conflict Misses in Direct-Mapped Caches

Periodically misses in direct-mapped caches typically occur when programs access arrays whose sizes are a power of 2.

Take the following code as an example:

float dotprod(float x[8], float y[8])
{
    float sum = 0.0;
    int i;

    for (i = 0; i < 8; i++)
        sum += x[i] * y[i];
    
    return sum;
}

Suppose that: * floats are 4 bytes.

x is loaded into the 32 bytes of contiguous memory starting at address 0, and that y starts immediately after x at address 32.
A block is 16 bytes (big enough to hold four floats)
the cache consists of two sets, for a total cache size of 32 bytes.

Given these assumptions, each x[i] and y[i] will map to the identical cache set:

Element	Address	Set index	Element	Address	Set index
x[0]	0	0	y[0]	32	0
x[1]	4	0	y[1]	36	0
x[2]	8	0	y[2]	40	0
x[3]	12	0	y[3]	44	0
x[4]	16	1	y[4]	48	1
x[5]	20	1	y[5]	52	1
x[6]	24	1	y[6]	56	1
x[7]	28	1	y[7]	60	1

The first iteration of the loop references x[0], the block containing x[0]–x[3] is loaded into set 0.
The next reference is to y[0], another miss that causes the block containing y[0]–y[3] to be copied into set 0, overwriting the values of x that were copied in by the previous reference.

The term thrashing describes any situation where a cache is repeatedly loading and evicting the same sets of cache blocks.

One easy solution to this problem is to put B bytes of padding at the end of each array. For example, instead of defining x to be float x[8], we define it to be float x[12]:

Element	Address	Set index	Element	Address	Set index
x[0]	0	0	y[0]	48	1
x[1]	4	0	y[1]	52	1
x[2]	8	0	y[2]	56	1
x[3]	12	0	y[3]	60	1
x[4]	16	1	y[4]	64	0
x[5]	20	1	y[5]	68	0
x[6]	24	1	y[6]	72	0
x[7]	28	1	y[7]	76	0

x[i] and y[i] now map to different sets