- The most important criteria for a code generator to produce correct code.
- Correctness takes special significance because of the number of special cases that a code generator might face.
Input to the Code Generator
- The input to the code generator is an intermediate representation of the source program produced by the front end along with information in the symbol table that is used to determine the run time addresses of data objects denoted by the names in the IR.
- Choices of IR include three address representations such as quadruples, triples, indirect triples; virtual machine representations such as bytecode and stack machine code; linear representation such as postfix notation; graphical representation such as syntax trees and DAG’s.
- During code generation, we assume that the front end has scanned, parsed, and translated the source program into a relatively low-level IR, so that the values of the names appearing in the IR can be represented by quantities that the target machine can directly manipulate, such as integers and floating-point numbers.
- We also assume that all syntactic and static semantic errors have been detected, that necessary type checking has taken place, and that type conversion operators have been inserted wherever necessary.
The Target Program
- Instruction set architecture of the target machine has an impact on the difficulty of constructing a good code generator that produces high-quality machine code.
- The most common target machine architectures are RISC (reduced instruction set computer), CISC (complex instruction set computer), and stack-based.
- A RISC machine has many registers, three address instructions, simple addressing modes, and simple instruction set architecture.
- A CISC machine has few registers, two address instructions, a variety of addressing modes, several register classes, variable-length instructions, and instructions with side effects.
- In the stack-based machine, operations are done by pushing operands onto a stack and then performing the operations on the operand at top of the stack.
- To achieve high-performance top of the stack is typically kept in registers.
- Stack-based machines almost disappeared because it was felt that the stack organization was too limiting and required too many swap and copy operations.
- Stack based architecture was reviewed with the introduction of Java Virtual Machine (JVM).
- The JVM is a software interpreter for Java bytecode, an intermediate language produced by Java compilers.
- The interpreter provides software compatibility across multiple platforms, a major factor in the success of java.
- To overcome the high-performance penalty of interpretation, which can be on the order of a factor of 10, just in time (JIT) Java compilers have been created.
- These JIT compilers translate bytecode during runtime to the native hardware instruction set of the target machine.
- Another approach to improve java performance is to build a compiler that compiles directly into the machine instructions of the target machine, bypassing the java bytecode entirely.
- Producing an absolute machine language program as output has the advantage that it can be placed in a fixed location in memory and immediately executed.
- Programs can be compiled and executed quickly.
Instruction Selection:
- The code generator must map the IR program into a code sequence that can be executed by the target machine.
- The complexity of performing this mapping is determined by:
- Level of IR
- Nature of instruction set architecture
- The desired quality of generated code
If IR is high level:
- Code generator translates each IR statement into a sequence of machine instructions using code templates.
- This statement by statement code generation produces poor code and needs further optimization.
Nature of instruction set:
- If the target machine does not support each data type in a uniform manner, then each exception to the general rule requires special handling.
- On some machines, floating-point operations are done using separate registers.
- If we do not care about the efficiency of the target program, instruction selection is straightforward.
- For each type of three address statement, we can design a skeleton that defines the target code to be generated for that construct.
Example:
Every three address statement of the form
X = Y + Z
where X, Y, Z are statically allocated, can be translated into the code sequence:
LD R0, Y // R0 = Y, load Y into register R0
ADD R0, R0, Z // R0 = R0 + Z, add Z to R0
ST X, R0 // X = R0, store R0 into X
This strategy produces redundant load and stores.
Quality of code:
- Quality of code is determined by speed and size.
- If the target machine has an increment instruction (INC), then three address statement
a = a + 1 may be implemented more efficiently by single instruction INC a instead of more obvious sequence that loads a into register, add one to the register, then store the result back to a.
LD R0, a // R0 = a
ADD R0, R0, #1 // R0 = R0 + 1
ST a, R0 // a = R0
Register Allocation:
- Registers are fastest computational unit on target machine, but we do not have enough of them to hold all values.
- Instructions involving register operands are invariably shorter and faster than those involving operands in memory, so efficient utilization of register is important.
- Use of register is subdivided into two sub-problems:
- Register Allocation
- Register Assignment
Register Allocation: during which we select the set of variables that will reside in register at each point in the program.
Register Assignment: during which we pick the specific register that a variable will reside in. Example:
- Certain machine requires register pair (an even and next odd numbered registers) for some operands and results.
- On some machine integer multiplication and integer division involve register pairs.
Multiplication instruction is of the form M x, y
where x is multiplicand, is even register of even/ odd register pair and y is multiplier, odd register.
Division instruction of the form D x, y
where dividend occupies even/ odd register pair whose even register is x , the divisor is y. after division even register will hold remainder, odd register will hold quotient.
Consider two three address code sequence t = a + b t = a + b
t = t * c t = t + c
t = t / d t = t / d
(a) (b)
Shortest assembly code sequence for (a) and (b) are L R1, a L R0, a
A R1, b A R0, b
M R0, c A R0, c
D R0, d SRDA R0, 32
ST R1, t D R0, d
ST R1, T
(a) (b)
Ri stands for register i.
SRDA for shift right double arithmetic
SRDA R0, 32 shifts the dividend into R1 and clears R0 so all bits equal its sign bit.
Evaluation Order:
- The efficiency of the target code is affected by the order of computations.
- Some computation orders require fewer registers to hold intermediate results than others.
- Picking the best order in the general case is a difficult NP problem.