3.8.Array Allocation and Access

\(3.8.\)Array Allocation and Access

1.Pointer Arithmetic

  Suppose the starting address of integer array E and integer index i are stored in registers %rdx and %rcx. Below are some assembly-code implementation of array expressions, with the result being stored in either %eax(for data) or register %rax(for pointers):

The dereference of a pointer is through (register_name).

The final example shows that one can compute the difference of two pointers within the same data structure, with the result being data having type long and value equal to the difference of the two addresses divided by the size of the data type.

2.Nested Arrays

  The array elements are ordered in memory in row-major order:

  Array element M[i][j] can be easily reached through a series of arithmetic operations. For example:

1
2
3
4
# A in %rdi, i in %rsi, and j in %rdx
leaq (%rsi,%rsi,2), %rax # Compute 3i
leaq (%rdi,%rax,4), %rax # Compute xA + 12i
movl (%rax,%rdx,4), %eax # Read from M[xA + 12i + 4]

3.Fixed-sized Arrays

  The C compiler is able to make many optimizations for code operating on multi-dimensional arrays of fixed size. Take the inner product calculation program as an example. The original form is as below:

1
2
3
4
5
6
7
8
9
/* Compute i,k of fixed matrix product */
int fix_prod_ele(fix_matrix A, fix_matrix B, long i, long k)
{
long j;
int result = 0;
for (j = 0; j < N; j++)
result += A[i][j] * B[j][k];
return result;
}

  The optimized C code is as below:

1
2
3
4
5
6
7
8
9
10
11
12
13
/* Compute i,k of fixed matrix product */
int fix_prod_ele_opt(fix_matrix A, fix_matrix B, long i, long k) {
int *Aptr = &A[i][0]; /* Points to elements in row i of A */
int *Bptr = &B[0][k]; /* Points to elements in column k of B */
int *Bend = &B[N][k]; /* Marks stopping point for Bptr */
int result = 0;
do { /* No need for initial test */
result += *Aptr * *Bptr; /* Add next product to sum */
Aptr ++; /* Move Aptr to next column */
Bptr += N; /* Move Bptr to next row */
} while (Bptr != Bend); /* Test for stopping point */
return result;
}

  The optimization follows the following steps:

  1. Generating a pointer, which we have named Aptr, that points to successive elements in row i of A.

  2. Generating a pointer, which we have named Bptr, that points to successive elements in column k of B.

  3. Generating a pointer, which we have named Bend, that equals the value Bptr will have when it is time to terminate the loop.

  The key points of optimization are as below:

  1. Decide the starting address and end address of the array.

  2. Decide how to move the address forward.

  \(e.g.\)The C program below:

1
2
3
4
5
6
7
/* Set all diagonal elements to val */
void fix_set_diag(fix_matrix A, int val)
{
long i;
for (i = 0; i < N; i++)
A[i][i] = val;
}

  With the properties:

  • Starting address: &A[0][0].

  • Ending address: &A[N+1][0].

  • Moving stride: index += N + 1.

  And the optimized program:

1
2
3
4
5
6
7
8
9
10
void fix_set_diag_opt(fix_matrix A, int val)
{
int *Abase = &A[0][0];
long i = 0;
long iend = N * (N + 1);
do {
Abase[i] = val;
i += (N + 1);
} while (i != iend);
}

4.Variable-Size Arrays

  Programmers requiring variable-size arrays had to allocate storage for these arrays using functions such as malloc or calloc, and they had to explicitly encode the mapping of multidimensional arrays into single-dimension ones via row-major indexing.

  If we declare an array int A[exp1][exp2], the dimensions of the array are determined by evaluating the expressions expr1 and expr2 at the time the declaration is encountered.

  For example, the following C program:

1
2
3
int var_ele(long n, int A[n][n], long i, long j) {
return A[i][j];
}

  can be translated into the following assembly code:

1
2
3
4
5
var_ele:
imulq %rdx, %rdi # Compute n * i
leaq (%rsi,%rdi,4), %rax # Compute xA + 4(n * i)
movl (%rax,%rcx,4), %eax # Read from M[xA + 4(n * i) + 4 * j]
ret

  Instead of using leaq, the dynamic version must use imul to scale i by n. This is because we don't know the exact size of array until encounter the declaration.

  The optimization of variable-size arrays is similar to that of the fixed-size:

  For example, the C program:

1
2
3
4
5
6
7
8
9
int var_prod_ele(long n, int A[n][n], int B[n][n], long i, long k) {
long j;
int result = 0;

for (j = 0; j < n; j++)
result += A[i][j] * B[j][k];

return result;
}

  can be optimized as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/* Compute i,k of variable matrix product */
int var_prod_ele_opt(long n, int A[n][n], int B[n][n], long i, long k)
{
int *Arow = A[i];
int *Bptr = &B[0][k];
int result = 0;
long j;
for (j = 0; j < n; j++)
{
result += Arow[j] * *Bptr;
Bptr += n;
}
return result;
}

We can see that GCC is able to recognize patterns that arise when a program steps through the elements of a multidimensional array, and generate the pointer-based code or the array-based code to avoid direct multiple calculation.