4.3.Sequential Y86-64 Implementation

发表于 2024-07-25 更新于 2024-07-26 分类于 CSAPP ， Chapter 4.Processor Architecture 阅读次数：本文字数： 2.4k 阅读时长 ≈ 9 分钟

\(4.3.\)Sequential Y86-64 Implementation

1.Organizing Processing into Stages

\(a.\)Summary

The following is an informal description of the stages and the operations performed within them:

Fetch: The fetch stage reads the bytes of an instruction from memory, using the program counter(PC) as the memory address.
- From the instruction it extracts the two 4-bit portions of the instruction specifier byte, referred to as icode(the instruction code) and ifun(the instruction function).
- It computes valP to be the address of the instruction following the current one in sequential order. That is, valP equals the value of the PC plus the length of the fetched instruction.
Decode: The decode stage reads up to two operands from the register file, giving values valA and/or valB.
Execute: In the execute stage, the arithmetic/logic unit(ALU) either performs the operation specified by the instruction(according to the value of ifun), computes the effective address of a memory reference, or increments or decrements the stack pointer. We refer to the resulting value as valE.
- For a conditional move instruction, the stage will evaluate the condition codes and move condition (given by ifun) and enable the updating of the destination register only if the condition holds.
Memory: The memory stage may write data to memory, or it may read data from memory. We refer to the value read as valM.
Write Back: The write-back stage writes up to two results to the register file.
PC Update: The PC is set to the address of the next instruction.

In our simplified implementation, the processor will stop when any exception occurs - that is, when it executes a halt or invalid instruction, or it attempts to read or write an invalid address.

\(b.\)Arithmetic operations

For the integer-operation instruction:
1. In the fetch stage, we do not require a constant word, and so valP is computed as PC+2.
2. During the decode stage, we read both operands. These are supplied to the ALU in the execute stage, along with the function specifier ifun.
3. valE equals to the instruction result valB OP valA. The OP is specified by ifun.
For the rmmovq instruction:
1. The process is similar to integer-operation operation, but we don't need to fetch the second register operand. Instead, we set the second ALU input to zero and add this to the first, giving valE = valA.
2. In addition, we must increment the program counter by 10 for irmovq due to the long instruction format.

\(c.\)Operations involving memory operation

The following instructions involve memory write and read stage:

The process use the ALU to add valC to valB, giving the effective address for memory operation.
In the memory stage, we either write the register value valA to memory or read valM from memory.

\(d.\)`pushq` & `popq` operations

The pushq and popq involve both accessing memory and incrementing or decrementing the stack pointer.

For the pushq instruction:
1. In decode stage, we use %rsp as the identifier for the second register operand, giving the stack pointer as valB.
2. In the execute stage, we use the ALU to decrement the stack pointer by 8. This decremented value is used for the memory write address and is also stored back to %rsp in the write-back stage.
3. We use valE as the address for the write operation.

we adhere to the Y86-64 convention that pushq should decrement the stack pointer before writing.

For the popq instruction:
1. The popq instruction proceeds much like pushq, except that we read two copies of the stack pointer in the decode stage. This is clearly redundant, but we will see that having the stack pointer as both valA and valB makes the subsequent flow more similar to that of other instructions, enhancing the overall uniformity of the design.
2. We use the unincremented value as the address for the memory operation.
3. In the write-back stage, we update both the stack pointer register with the incremented stack pointer and register rA with the value read from memory.

\(e.\)Control transfer instructions

A jump instruction proceeds through fetch and decode much like the previous instructions, except that it does not require a register specifier byte.
In the execute stage, we check the condition codes and the jump condition to determine whether or not to take the branch, yielding a 1-bit signal Cnd.
We test this flag and set the PC to valC(the jump target) if the flag is 1 and to valP(the address of the following instruction) if the flag is 0.

\(f.\)`call` & `ret` operations

Instructions call and ret is similar to pushq and popq, except that we push and pop program counter values.
- With instruction call, we push valP, the address of the instruction that follows the call instruction. During the PC update stage, we set the PC to valC, the call destination.
- With instruction ret, we assign valM, the value popped from the stack, to the PC in the PC update stage.

2.SEQ Hardware Structure

The following figure shows an abstract view of a hardware structure that performs the six stages:

Information then flows along wires(shown grouped together as a heavy gray line), first upward and then around to the right.
The feedback paths coming back down on the right-hand side contain the updated values to write to the register file and the updated program counter.

The six stages are executed as below:

Fetch: Using the program counter register as an address, the instruction memory reads the bytes of an instruction. The PC incrementer computes valP.
Decode: The two register values valA and valB are read simultaneously from the read ports A and B.
Execute:
- The ALU do the operations for different purposes.
- The condition code register(CC) holds the three condition code bits. New values for the condition codes are computed by the ALU. The execution of move instruction and the Cnd of the jump instruction is computed based on the CC.
Memory: The data memory reads or writes a word of memory when executing a memory instruction.
- The instruction and data memories access the same memory locations, but for different purposes.
Write Back: The register file has two write ports. Port E is used to write values computed by the ALU, while port M is used to write values read from the data memory.
PC Update: The new value of the program counter is selected to be either valP, the address of the next instruction, valC, the destination address specified by a call or jump instruction, or valM, the return address read from memory.

The following figure gives a more detailed view of the hardware design:

3.SEQ Timing

\(a.\)Some basic idea

Combinational logic does not require any sequencing or control-values propagate through a network of logic gates whenever the inputs change.
We assume that reading from a random access memory operates much like combinational logic, with the output word generated based on the address input.
The program counter is loaded with a new instruction address every clock cycle.
The condition code register is loaded only when an integer operation instruction is executed.
The data memory is written only when an rmmovq, pushq, or call instruction is executed.
The two write ports of the register file allow two program registers to be updated on every cycle, but we can use the special register ID 0xF as a port address to indicate that no write should be performed for this port.
Principle: The processor never needs to read back the state updated by an instruction in order to complete the processing of this instruction.

\(b.\)Program example

Every time the clock transitions from low to high, the processor begins executing a new instruction.

4.SEQ Stage Implementation

The used constant are shown below:

\(a.\)Fetch stage

The instruction memory hardware unit reads 10 bytes from memory at a time, using the PC as the address of the first byte(byte 0). This byte is interpreted as the instruction byte and is split(by the unit labeled "Split") into two 4-bit quantities.
The control logic blocks labeled "icode" and "ifun" then compute the instruction and function codes as equaling either the values read from memory or the values corresponding to a nop instruction(as indicated by the signal imem_error).

Based on the value of icode, we can compute three 1-bit signals:

instr_valid: This signal is used to detect an illegal instruction.
need_regids: Does this instruction include a register specifier byte?
need_valC: Does this instruction include a constant word?

The signals instr_valid and imem_error(generated when the instruction address is out of bounds) are used to generate the status code in the memory stage.

The HCL description for need_regids and need_valC are as below:

1
2
3

bool need_regids =
    icode in { IRRMOVQ, IOPQ, IPUSHQ, IPOPQ,
               IIRMOVQ, IRMMOVQ, IMRMOVQ };

1 2	bool need_valC = icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ, IJXX, ICALL };

The remaining 9 bytes read from the instruction memory encode some combination of the register specifier byte and the constant word.
- Byte 1 is split into register specifiers rA and rB when the computed signal need_regids is 1. If need_regids is 0, both register specifiers are set to 0xF(RNONE).
The PC incrementer hardware unit generates the signal valP, based on the current value of the PC, and the two signals need_regids and need_valC. For PC value \(p\), need_regids value \(r\), and need_valC value \(i\), the incrementer generates the value \(p+1+r+8i\).

\(b.\)Decode and write-back stagess

The register file has four ports. It supports up to two simultaneous reads(on ports A and B) and two simultaneous writes(on ports E and M).
- Each port has both an address connection and a data connection, where the address connection is a register ID, and the data connection is a set of 64 wires serving as either an output word (for a read port) or an input word (for a write port) of the register file.
- The two read ports have address inputs srcA and srcB, while the two write ports have address inputs dstE and dstM. The 0xF(RNONE) on an address port indicates that no register should be accessed.
The four blocks at the bottom generate the four different register IDs for the register file based on icode, rA, rB and Cnd.
- srcA indicates which register should be read to generate valA, so as srcB.
- dstE indicates the destination register for write port E, where the computed value valE is stored, so as dstM.

The HCL description of srcA and srcB are as below:

word srcA = [
    icode in { IRRMOVQ, IRMMOVQ, IOPQ, IPUSHQ } : rA;
    icode in { IPOPQ, IRET } : RRSP;
    1 : RNONE; # Don't need register
];

word srcB = [
    icode in { IOPQ, IRMMOVQ, IMRMOVQ } : rB;
    icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;
    1 : RNONE; # Don't need register
];

The HCL description of dstM is as below:

word dstM = [
    icode in { IMRMOVQ, IPOPQ } : rA;
    1 : RNONE; # Don't write any register
];

\(c.\)Execute stage

The stage includes ALU, which performs operations based on the setting of alufun signal. The ALU output becomes the signal valE.
The value of aluA can be valA, valC, or either ?8 or +8, depending on the instruction type, so as aluB

They can be described by the following HCL code:

word aluA = [
    icode in { IRRMOVQ, IOPQ } : valA;
    icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ } : valC;
    icode in { ICALL, IPUSHQ } : -8;
    icode in { IRET, IPOPQ } : 8;
    # Other instructions don't need ALU
];

word aluB = [
    icode in { IRMMOVQ, IMRMOVQ, IOPQ, ICALL,
    IPUSHQ, IRET, IPOPQ } : valB;
    icode in { IRRMOVQ, IIRMOVQ } : 0;
    # Other instructions don't need ALU
];

The operations in ALU is mostly an adder. However, we want it to use the operation encoded in the ifun field of the instruction. So we describe the ALU control as below:

word alufun = [
    icode == IOPQ : ifun;
    1 : ALUADD;
];

We only want to set the condition codes when an OPq instruction is executed. The logic is described in HCL as below:

1	bool set_cc = icode in { IOPQ };

The hardware unit labeled "cond" uses a combination of the condition codes and the function code to determine whether a conditional branch or data transfer should take place.
- It generates the Cnd signal used both for the setting of dstE with conditional moves and in the next PC logic for conditional branches.

By using the Cnd signal, we can describe dstE as below:

word dstE = [
    icode in { IRRMOVQ } && Cnd : rB;
    icode in { IIRMOVQ, IOPQ} : rB;
    icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;
    1 : RNONE; # Don't write any register
];

\(d.\)Memory stage

Two control blocks generate the values for the memory address and the memory input data(for write operations).
Two other blocks generate the control signals indicating whether to perform a read or a write operation.
When a read operation is performed, the data memory generates the value valM.

The HCL description of mem_addr and mem_data are as below:

word mem_addr = [
    icode in { IRMMOVQ, IPUSHQ, ICALL, IMRMOVQ } : valE;
    icode in { IPOPQ, IRET } : valA;
    # Other instructions don't need address
];

word mem_data = [
    # Value from register
    icode in { IRMMOVQ, IPUSHQ } : valA;
    # Return PC
    icode == ICALL : valP;
    # Default: Don't write anything
];

We set the signal mem_read only for instructions that read data from memory, so as mem_write:

1	bool mem_read = icode in { IMRMOVQ, IPOPQ, IRET };

1	bool mem_write = icode in { IRMMOVQ, IPUSHQ, ICALL };

A final function for the memory stage is to compute the status code Stat. It is generated from icode, imem_error, instr_valid and dmem_error. It's described in HCL as below:

## Determine instruction status
word Stat = [
    imem_error || dmem_error : SADR;
    !instr_valid: SINS;
    icode == IHALT : SHLT;
    1 : SAOK;
];

\(e.\)PC update stage

word new_pc = [
    # Call. Use instruction constant
    icode == ICALL : valC;
    # Taken branch. Use instruction constant
    icode == IJXX && Cnd : valC;
    # Completion of RET instruction. Use value from stack
    icode == IRET : valM;
    # Default: Use incremented PC
    1 : valP;
];