5.10.Some Limiting Factors
Some Limiting Factors
1.Register Spilling
If a program has a degree of parallelism
Once the number of loop variables exceeds the number of available registers, the program must allocate some on the stack.
Take the following
1 | # Updating of accumulator acc0 in 10 x 10 urolling |
The comparable part of the code for
1 | # Updating of accumulator acc0 in 20 x 20 unrolling |
The accumulator is kept as a local variable on the stack, at offset
40 from the stack pointer. The program must read both its value
and the value of data[i]
from memory, multiply
them, and store the result back to memory.
2.Branch Prediction
gcc don't use conditional transfers of control to generate code for branch prediction, but rather compute the values along both branches of a conditional expression or statement and then use conditional moves to select the desired value.
To illustrate this, we use the program with branch prediction and that without predition as examples:
1 | void combine4(vec_ptr v, data_t *dest) { |
1 | /* Include bounds check in loop */ |
The additional computation required to perform bounds checking can take place in parallel with the combining operations. The processor is able to predict the outcomes of these branches, and so none of this evaluation has much effect on the fetching and processing of the instructions that form the critical path in the program execution.
3.Write Code Suitable for Implementation with Conditional Moves
- gcc is able to generate conditional moves for code written in a more "functional" style, where we use conditional operations to compute values and then update the program state with these values, as opposed to a more "imperative" style, where we use conditionals to selectively update program state.
Take the program that set a[i]
to be the maximum of
a[i], b[i]
and b[i]
the minimum.
1 | /* imperative style */ |
1 | /* functional style */ |
The method is like data transfer, we use less conditional expression to reduce misprediction rate.
Gitalking ...