A phase is a logically cohesive operation that takes as input one representation of the source program and produces as output another representation. A compiler takes as input a source program and produces as output an equivalent sequence of machine instructions.
- Lexical Analyzer
- Syntax Analyzer
- Intermediate Code Generator
- Code Optimization
- Code Generation
1. Lexical Analyzer
- This is the first phase of the compiler.
- Also called a scanner.
- Separates characters of the source language into groups that logically belong together.
- Groups are called tokens (DO or IF, identifiers, operator symbols like <= or +, punctuation symbols like parentheses or commas).
- The output of the lexical analyzer is a stream of tokens.
- These tokens are passed to the next phase.
- The tokens are represented by codes (e.g. DO might by 1, + by 2, identifier by 3, etc.).
2. Syntax Analyzer
- This is the second phase of the compiler.
- Also called a parser.
- Groups tokens together into a syntactic structure called expression.
- Expressions might be combined to form statements.
- The syntactic structure can be regarded as a tree whose leaves are tokens.
- The interior nodes of the tree represent strings of tokens that logically belong together.
3. Intermediate Code Generator
- This is the third phase of the compiler.
- Uses the structure produced by syntax analyzer to create a stream of simple instruction.
- There may be many styles of intermediate code.
- The most common style is instruction with one operator and a small no. of operands.
- Instructions are like macros.
- Intermediate code need not specify the registers to be used for each operation.
4. Code Optimization
- This is the fourth and optional phase of the compiler.
- Designed to improve the intermediate code.
- Ultimate object program runs faster, takes less space.
- Its output is another intermediate code program doing the same job as the original.
- Saves time or space.
5. Code Generation
- This is the last phase of the compiler.
- Produces object code by deciding
- Memory locations for data.
- Selecting code to access each datum.
- Selecting the registers in which each computation is to be done.
- One of the difficult parts of the compiler.
Apart from these phases Routines that interact with all phases of compiler are
Table Management
- Also called bookkeeping.
- The compiler keeps track of the names used by the program.
- Records essential information about each (such as integer, real, etc.).
- The data structure used to record this information is called a symbol table.
Error Handler
- Invoked when a flaw in the source program is detected.
- Must warn the programmer by issuing diagnostic information.
- The compilation is completed on flawed programs, at least through the syntax analysis phase, so that as many errors can be detected in one compilation.
Passes
In a compiler, portions of one or more phases are combined into a
a module called a pass.
- A pass reads the source program or output of the previous pass.
- Makes the transformations specified by its phases.
- Writes output into an intermediate file, which may be read by a subsequent pass.
- A multi-pass compiler is slower than a single pass compiler because each pass reads and writes an intermediate file.
- Compiler running on small memory computer would use several passes.
- Computer with a large RAM, fewer passes would be possible.