Latest revision as of 21:50, 10 March 2021

Lexical Analysis

  if (i == j)
    z = 0;
  else
    z = 1;

is indeed below in computers

\tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;

An implementation must do

Recognize substrings corresponding to tokens
Identify the token class of each lexeme

Token Class

Identifier, keywords, '(', ')', Numbers, ...

Token classes correspond to sets of strings.
Identifier: A1, Foo, B17
Integer: 0, 99
Keyword: 'else' or 'if' or 'begin' or ...
Whitespace: if___else

For the last code example, the tokens are: if, whitespace, (, i, == , j, \t, \n, else, z, =, 1, ;

                                 Token    
   string ---> Lexical Analysis -------> Parser

Regular Languages

Regular expressions specify regular languages.

Five constructs

Two base cases - empty and 1-character strings
Three compound expressions - union, concatenation, iteration.

Finite Automata

Regular expressions = specification
Finite automata = implementation

A finite automaton consists of

input alphabet
set of states
start state
set of accepting states
set of transitions

Parsing

	Input	Ouput
Lexer	Strings of characters	Strings of tokens
Parser	String of tokens	Parse tree

Context-Free Grammars (CFG)

Parser must distinguish between valid and invalid strings of tokens.

Programming languages have recursive structure.

Semantic Analysis

Last "front end" phase (together with lexical analysis & parsing) to enforce language
Catches all remaining errors

Coolc checks

all identifiers are declared
types
inheritance relationships
classes defined only once
methods in a class defined only once
reserved identifiers are not misused
...

Scope

Static scope: scope depends only on the program text, not on run-time behavior
Dynamically scope: scope depends on execution of the program

Type

It doesn't make sense to add a function pointer and an integer in C.
It does make sense to add two integers
But both have the same assembly language implementation!

A language's type system specifies which operations are valid for which types.

Three kinds of languages:

Statistically typed: All or almost all check of types is done as part of compilation (C, Java, Cool)
Dynamically typed: Almost all checking of types is done as part of program execution (Scheme, Python, Perl)
Untyped: No type checking (machine code)

Run-time Organization

Activation Record & Stack

Global and Heap

Memory

Low address  +----------------------+
             |          Code        |
             +----------------------+
             |      Static Data     |
             +----------------------+
             |   Stack (can grow)   |
             |          |           |
             |          v           |
             |......................|
             |          ^           |
             |          |           |
             |   Heap (can grow)    |
             |                      |
High address +----------------------+

Code Generation

Optimization

Garbage collection

What Is Garbage Collection, and How Does It Affect Your Program’s Performance?

Automatic Memory Managment

C and C++ programs have many storage bugs

forgetting to free unused memory
dereferencing a dangling pointer
overwriting parts of a data structure by accident
and so on ...

How do we know an object will "never be used again"?

Observation: a program can use only the objects that it can find:

  let x : A <- new A in { x <- y; ... }

An object x is reachable if and only if

a register contains a pointer to x, or
another reachable object y contains a pointer to x.

An unreachable object can never be used - such objects are garbage.

Mark and Sweep phases

When memory runs out, GC executes two phases

Mark phase: trace reachable objects
Sweep phase: collects garbage objects

Every object has an extra bit: the mark bit

reserved for memory management
initially the mark bit is 0
set to 1 for the reachable objects in the mark phase.

Stop and Copy

Resource

coursera.org

@@ Line 71: / Line 71: @@
 = Semantic Analysis =
+* Last "front end" phase (together with lexical analysis & parsing) to enforce language
+* Catches all remaining errors
+Coolc checks
+# all identifiers are declared
+# types
+# inheritance relationships
+# classes defined only once
+# methods in a class defined only once
+# reserved identifiers are not misused
+# ...
+== Scope ==
+* Static scope: scope depends only on the program text, not on run-time behavior
+* Dynamically scope: scope depends on execution of the program
+== Type ==
+* It doesn't make sense to add a function pointer and an integer in C.
+* It does make sense to add two integers
+* But both have the same assembly language implementation!
+A language's type system specifies which operations are valid for which types.
+Three kinds of languages:
+* Statistically typed: All or almost all check of types is done as part of compilation (C, Java, Cool)
+* Dynamically typed: Almost all checking of types is done as part of program execution (Scheme, Python, Perl)
+* Untyped: No type checking (machine code)
+= Run-time Organization =
+== Activation Record & Stack ==
+== Global and Heap ==
+<pre>
+Memory
+Low address  +----------------------+
+             |          Code        |
+             +----------------------+
+             |      Static Data     |
+             +----------------------+
+             |   Stack (can grow)   |
+             |          |           |
+             |          v           |
+             |......................|
+             |          ^           |
+             |          |           |
+             |   Heap (can grow)    |
+             |                      |
+High address +----------------------+
+</pre>
+= Code Generation =
 = Optimization =
-= Code Generation =
+= Garbage collection =
+[https://www.cloudsavvyit.com/10031/what-is-garbage-collection-and-how-does-it-affect-your-programs-performance/ What Is Garbage Collection, and How Does It Affect Your Program’s Performance?]
+== Automatic Memory Managment ==
+C and C++ programs have many storage bugs
+* forgetting to free unused memory
+* dereferencing a dangling pointer
+* overwriting parts of a data structure by accident
+* and so on ...
+How do we know an object will "never be used again"?
+Observation: a program can use only the objects that it can find:
+<pre>
+  let x : A <- new A in { x <- y; ... }
+</pre>
+An object x is '''reachable''' if and only if
+* a register contains a pointer to x, or
+* another reachable object y contains a pointer to x.
+An unreachable object can never be used - such objects are '''garbage'''.
+== Mark and Sweep phases ==
+When memory runs out, GC executes two phases
+* Mark phase: trace reachable objects
+* Sweep phase: collects garbage objects
+Every object has an extra bit: the mark bit
+* reserved for memory management
+* initially the mark bit is 0
+* set to 1 for the reachable objects in the mark phase.
+== Stop and Copy ==
 = Resource =
-* coursera.org
+* [https://class.coursera.org/compilers-004/ coursera.org]