3.9.Heterogeneous Data Structures

发表于 2024-07-17 更新于 2024-07-18 分类于 CSAPP ， Chapter 3.Machine-level Representation of Program 阅读次数：本文字数： 860 阅读时长 ≈ 3 分钟

\(3.9.\)Heterogeneous Data Structures

1.Structures

The implementation of structures is similar to that of arrays: all of the components of a structure are stored in a contiguous region of memory and a pointer to a structure is the address of its first byte.

The compiler maintains information about each structure type indicating the byte offset of each field.

It generates references to structure elements using these offsets as displacements in memory referencing instructions.

For example, the structure below:

struct rec {
    int i;
    int j;
    int a[2];
    int *p;
};

is represented in memory like this:

and the compiler access the code by adding the appropriate offset to the address of the structure:

1
2
3

# Registers: r in %rdi
movl (%rdi), %eax   # Get r->i
movl %eax, 4(%rdi)  # Store in r->j

To generate a pointer to an object within a structure, we simply add the field's offset to the structure address:

1 2	# Registers: r in %rdi, i %rsi leaq 8(%rdi,%rsi,4), %rax # Set %rax to &r->a[i]

The selection of the different fields of a structure is handled completely at compile time. The machine code contains no information about the field declarations or the names of the fields.

2.Unions

Unions allow a single object to be referenced according to multiple types. Rather than having the different fields reference different blocks of memory, they all reference the same block.

The overall size of a union equals the maximum size of any of its fields.

Unions are used in several cases:

We know in advance that the use of two different fields in a data structure will be mutually exclusive.

For example, we can represent a binary tree using structure:

struct node_s {
    struct node_s *left;
    struct node_s *right;
    double data[2];
};

which use 32 bytes. But if we use union:

union node_u
{
    struct {
        union node_u *left;
        union node_u *right;
    } internal;
    double data[2];
};

it will only take up 16 bytes.

Moreover, to identify leaf with internal node, we can use an enumerated type defining the different possible choices for the union, and then create a structure containing a tag field and the union:

typedef enum { N_LEAF, N_INTERNAL } nodetype_t;

struct node_t {
    nodetype_t type;
    union {
        struct {
            struct node_t *left;
            struct node_t *right;
        } internal;
        double data[2];
    } info;
};

// example

    // leaves
    struct node_t *leaf = malloc(sizeof(struct node_t));
    leaf->type = N_LEAF;
    leaf->info.data[0] = 1.23;
    leaf->info.data[1] = 4.56;

    // internal
    struct node_t *internal = malloc(sizeof(struct node_t));
    internal->type = N_INTERNAL;
    internal->info.internal.left = leaf;
    internal->info.internal.right = NULL;

The union won't change the bit it stored according to different types, even if the value is stored in one type but access in another.

Say we convert a double d to an unsigned long variable:

1	unsigned long u = (unsigned long) d;

Value u will be an integer representation of d. Except for the case where d is 0.0, the bit representation of u will be very different from that of d.

However, if we use unions:

unsigned long double2bits(double d) {
    union {
        double d;
        unsigned long u;
    } temp;
    temp.d = d;
    return temp.u;
};

The result will be that u will have the same bit representation as d, including fields for the sign bit, the exponent, and the significand.

And with this characteristic, we can design the following program that gets the words of a double number:

double uu2double(unsigned word0, unsigned word1) {
    union {
        double d;
        unsigned u[2];
    } temp;
    temp.u[0] = word0; // low-order 4 bytes
    temp.u[1] = word1; // high-order 4 bytes
    return temp.d;
}

3.Data Alignment

Data alignment is operation that enforce the address of some objects must be a multiple of some valueK (typically 2, 4, or 8).

The align operation can be declared by the following statement:

.align 8

This ensures that the data following it will start with an address that is a multiple of 8.

To do data alignment, the compiler uses these two techniques. Take the following C struction as an example:

struct S1 {
    int i;
    char c;
    int j;
};

Byte gap:

End padding: This is to ensure that the latter data will also be aligned:

\(p.s.\)Each type of data has its own alignment requirement, so we can minimize the wasted place by rearrange the fields of structure. Note that we don't have to make all variables applied to an only alignment rule!

\(e.g.\) The structure:

struct {
    int *a;
    float b;
    char c;
    short d;
    long e;
    double f;
    int g;
    char *h;
} rec;

is rearranged to the structure below:

struct {
    double f;   // 8 bytes
    long e;     // 8 bytes
    int *a;     // 8 bytes
    char *h;    // 8 bytes
    int g;      // 4 bytes
    float b;    // 4 bytes
    short d;    // 2 bytes
    char c;     // 1 byte
    // 3 bytes padding to make the total size a multiple of 8
} rec;

Following the strategy of putting the fields required for larger alignment first.