3.1.Translating & Starting a Program

\(3.1.\)Translating & Starting a Program

1.A complex example

  When we are going to translate the program, we should follow the following process:

  1. Need to decide on what variables go in what registers

  2. Need to determine which registers we need to save on function entry (the function preamble/prolog)

    1. Decrement the stack
    2. Save all the callee saved registers we will use
    3. Save ra or any other caller saved registers we need to live across calls to other functions
  3. Translate the code itself

  4. Restore all the registers necessary (the function postamble/epilog)

    1. Restore the callee saved registers we used
    2. Restore ra
    3. Increment the stack
  5. Return using jr

  There are some points in the process:

  • Return null:
1
2
li a0 0
j postamble
  • When calling function, we can use the call pseudo-instruction for jal ra (location). For example, call malloc.

2.Interpretation

  • Interpreter directly executes a program in the source language.

  • Since interpreter is closer to high-level, it can give better error messages during interpreting.

  • Because the interpreter executes intermediate representations rather than direct machine code, the same source code can be run from one machine to another without modification of the source code.

3.Calling chain

4.Compilation

  \(a.\)Concepts

  The compiler transforms the C program into an assembly language program, a symbolic form of what the machine understands. High-level language programs take many fewer lines of code than assembly language, so programmer productivity is much higher.

  \(b.\)Assembler

  • The assembler converts the assembly language instruction into machine language. It turns the assembly language program into an object file, which is a combination of machine language instructions, data, and information needed to place instructions properly in memory.

  • Assembler will also accept numbers in a variety of bases.

  To produce the binary version of each instruction in the assembly language program, the assembler must determine the address corresponding to all labels. This is done by a symbol table.

  • The object file for UNIX mainly consists of:

    • The object file header describes the size and position of the other pieces of the object file.

    • The text segment contains the machine language code.

    • The static data segment contains data allocated for the life of the program.

    • The relocation information identifies instructions and data words that depend on absolute addresses when the program is loaded into memory.

    • The symbol table contains the remaining labels that are not defined, such as external references.

      • Use dummy "placeholders" for unresolved absolute and external references.
    • The debugging information contains a concise description of how the modules were compiled.


\(p.s.\) The auipc instruction adds an immediate number to the current PC value. This means that the generated address is relative to the current instruction. It add an offset to the PC (upper 20 bits to the left of the instant number by 12 bits), and use addi to load the lower 12 bits.

lui instructions are used to generate absolute addresses, which are fixed addresses of data in memory. This approach works for memory addresses that do not move. It loads an instant number to the upper 20 bits of the register.

  \(c.\)Linker

  Linker is used for compiling and assembling each procedure independently, so that a change to one line would require resolving only one procedure. It's a system that combines independently assembled machine language programs and resolves all undefined labels into an executable file.

  There are three steps for the linker:

  1. Place code and data modules symbolically in memory.
  2. Determine the addresses of data and instruction labels.
  3. Patch both the internal and external references.

  The linker uses the relocation information and symbol table in each object module to resolve all undefined labels.

  • For each entry in the relocation table, it replaces it with the actual address for the symbol table of the item we are linking to.

  If all external references are resolved, the linker next determines the memory locations each module will occupy. Since the files were assembled in isolation, the assembler couldn't know where a module's instructions and data would be placed relative to other modules. So when the linker places a module in memory, all absolute references, that is, memory addresses that are not relative to a register, must be relocated to reflect its true location.

  \(d.\)Type of addressing

  \(e.\)Loader

5.Dynamically linked libraries

  In tradition, before the program is run, the library has been linked. Although this static approach is the fastest way to call library routines,it has a few disadvantages:

  1. The library routines become part of the executable code. If a new version of the library is released, the statically linked program keeps using the old version.

  2. It loads all routines in the library that are called anywhere in the executable, even if those calls are not executed.

  A replacement of it is called dynamically linked libraries, where the library routines are not linked and loaded until the program is run.

  1. The first time the library routine is called, the program calls the dummy entry and follows the indirect branch, which points to a piece of code.

  2. The code puts a number in a register to identify the desired library routine.

  3. Then it branches to the dynamic linker/loader.

  4. The linker/loader finds the wanted routine, remaps it into the memory, and changes the address in the indirect branch location to point to that routine. Then it branches to it.

The dynamic linker loads the library function into memory on the first call and updates the indirect branch address. In this way, if the following call to the library function, the program can directly jump to the correct memory address, avoid repeated symbol parsing and address calculation, which improves the efficiency of the program.

  1. When the routine completes, it returns to the original calling site.