3.11.Floating-Point Code
\(3.11.\)Floating-Point Code
1.Floating-Point Movement and Conversion Operations
\(a.\)Basic instructions
For floating-point, the data are held either in memory (indicated in the table as \(M_{32}\) and \(M_{64}\)) or in XMM registers (shown in the table as
X
).- Each YMM register is 32 bytes long. When operating on scalar data, these registers only hold floating-point data, and only the low-order 32 bits (for float) or 64 bits (for double) are used.
- The 'a' stands for 'aligned'.
- When converting floating-point values to integers, the two-operand floating-point conversion operations perform truncation, rounding values toward zero.
For three-operand floating-point conversion operations, we can ignore the second operand, since its value only affects the upper bytes of the result. In common use, both the second source and the destination oprands are identical:
1 | vcvtsi2sdq %rax, %xmm1, %xmm1 |
This instruction reads a long integer from register
%rax
, converts it to data type double, and stores
the result in the lower bytes of XMM register
%xmm1
.
\(b.\)Conversion between floating-point
To convert between two different floating-point format, suppose the
low-order 4 bytes of %xmm0
hold a single-precision value;
then it would seem straightforward to convert this to a double-precision
value and store the result in the lower 8 bytes of register
%xmm0
:
1 | vcvtss2sd %xmm0, %xmm0, %xmm0 |
However, GCC will generate the following code:
1 | # Conversion from single to double precision |
The
vunpcklps
instruction interleaves the values in two XMM registers and store them in a third. That is, if one source register contains words \([s_3, s_2, s_1, s_0]\) and the other contains words \([d_3, d_2, d_1, d_0]\), then the value of the destination register will be \([s_1, d_1, s_0, d_0]\).The
vcvtps2pd
instruction expands the two low-order single precision values in the source XMM register to be the two double-precision values in the destination XMM register. Applying this to the result of the precedingvunpcklps
instruction would give values \([dx_0,dx_0]\), where \(dx_0\) is the result of converting \(x\) to double precision.
GCC generates similar code for converting from double precision to single precision:
1 | # Conversion from double to single precision |
rather than by using the single instruction:
1 | vcvtsd2ss %xmm0, %xmm0, %xmm0 |
2.Floating-Point Code in Procedure
The following conventions are observed:
Up to eight floating-point arguments can be passed in XMM registers
%xmm0~%xmm7
. These registers are used in the order the arguments are listed. Additional floating-point arguments can be passed on the stack.A function that returns a floating-point value does so in register
%xmm0
.All XMM registers are caller saved. The callee may overwrite any of these registers without first saving it.
When a function contains a combination of pointer, integer, and floating-point arguments, the pointers and integers are passed in general-purpose registers, while the floating-point values are passed in XMM registers. This means that the mapping of arguments to registers depends on both their types and their ordering.
\(e.g.\)
1 | double g1(double a, long b, float c, int d) |
Registers: a
in %xmm0
, b
in
%rdi
, c
in%xmm1
, d
in %esi
.
3.Floating-Point Arithmetic Operations
The first source operand S1 can be either an XMM register or a memory location.
The second source operand and the destination operands must be XMM registers.
Take the following C program as an example:
1 | double funct(double a, float x, double b, int i) { |
1 | # double funct(double a, float x, double b, int i) |
Sometimes the bitwise operations are a useful way to manipulate floating-point values. The following are some examples:
\(a.\)Taking absolute value:
1 | vmovsd .LC1(%rip), %xmm1 |
\(b.\)Set value to zero:
1 | vxorpd %xmm0, %xmm0, %xmm0 |
\(c.\)Negate
1 | vmovsd .LC2(%rip), %xmm1 |
4.Defining and Using Floating-Point Constants
AVX floating-point operations cannot have immediate values as oprands. Instead, the compiler must allocate and initialize storage for any constant values.
Take the following C program as an example:
1 | double cel2fahr(double temp) { |
1 | cel2fahr: |
the function reads the value 1.8 from the memory location labeled
.LC2
and the value 32.0 from the memory location labeled
.LC3
.
5.Floating-Point Comparison Operations
These instructions are similar to the
cmp
instructions for integer: they compare operands S1 and S2 and set the condition codes to indicate their relative values.As with
cmpq
, they follow the ATT-format convention of listing the operands in reverse order.Argument S2 must be in an XMM register, while S1 can be either in an XMM register or in memory.
The floating-point comparison instructions set three condition codes: the zero flag, the carry flag, and the parity flag:
The parity flag is set when either operand is \(NaN\).
By convention, any comparison in C is considered to fail when one of the arguments is \(NaN\), and this flag is used to detect such a condition. For example, even the comparison
x == x
yields 0 when x is \(NaN\).The unordered case occurs when either operand is \(NaN\).
There are also three relative jump statements:
jp
: It conditionally jump when a floating-point comparison yields an unordered result.ja
: It conditionally jump whenCF=0
andZF=0
.jb
: It conditionally jump whenCF=1
Take the following C program as an example:
1 | typedef enum {NEG, ZERO, POS, OTHER} range_t; |
1 | find_range: |