Compiler
Jump to navigation
Jump to search
Lexical Analysis
if (i == j) z = 0; else z = 1;
is indeed below in computers
\tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
An implementation must do
- Recognize substrings corresponding to tokens
- Identify the token class of each lexeme
Token Class
Identifier, keywords, '(', ')', Numbers, ...
- Token classes correspond to sets of strings.
- Identifier: A1, Foo, B17
- Integer: 0, 99
- Keyword: 'else' or 'if' or 'begin' or ...
- Whitespace: if___else
For the last code example, the tokens are: if, whitespace, (, i, == , j, \t, \n, else, z, =, 1, ;
Token string ---> Lexical Analysis -------> Parser
Regular Languages
Regular expressions specify regular languages.
Five constructs
- Two base cases - empty and 1-character strings
- Three compound expressions - union, concatenation, iteration.
Finite Automata
- Regular expressions = specification
- Finite automata = implementation
A finite automaton consists of
- input alphabet
- set of states
- start state
- set of accepting states
- set of transitions
Parsing
Input | Ouput | |
---|---|---|
Lexer | Strings of characters | Strings of tokens |
Parser | String of tokens | Parse tree |
Context-Free Grammars (CFG)
Parser must distinguish between valid and invalid strings of tokens.
Programming languages have recursive structure.
Semantic Analysis
- Last "front end" phase (together with lexical analysis & parsing) to enforce language
- Catches all remaining errors
Coolc checks
- all identifiers are declared
- types
- inheritance relationships
- classes defined only once
- methods in a class defined only once
- reserved identifiers are not misused
- ...
Scope
- Static scope: scope depends only on the program text, not on run-time behavior
- Dynamically scope: scope depends on execution of the program
Type
- It doesn't make sense to add a function pointer and an integer in C.
- It does make sense to add two integers
- But both have the same assembly language implementation!
A language's type system specifies which operations are valid for which types.
Three kinds of languages:
- Statistically typed: All or almost all check of types is done as part of compilation (C, Java, Cool)
- Dynamically typed: Almost all checking of types is done as part of program execution (Scheme, Python, Perl)
- Untyped: No type checking (machine code)
Run-time Organization
Activation Record & Stack
Global and Heap
Memory Low address +---------------+ | Code | +---------------+ | Static Data | +---------------+ | Stack | |...............| | | | | High address +---------------+