Table of Contents

Introduction
1. Practical information
Recommended reading
Compilers - how do they work?
Language classification
1. Dynamic vs static typing
2. Strongly typed vs weakly typed
Grammars
1. Notation
2. The Chomsky hierarchy
Finite automata
1. Deterministic finite automata (DFA)
2. Non-deterministic finite automata (NFA)
Parsing
Symbol tables
Optimizations
1. Optimizations and when to apply them
2. Data-flow analysis
Syntax directed translations
Nice tips for the final

‹

TDT4205: Compiler Construction

Tags:

Introduction

A compiler is a program that takes another program as input and translates it into a third program, often so that it may be run on some sort of machine. This course takes a look at the inner workings of compilers and how to manage the complexity of their design.

Practical information

Compilers: Principles, Techniques and Tools (aka The Dragon Book), by Aho, Lam, Sethi and Ullman, is the very valuable textbook for the course.
Course slides are available at the lecturers folk.idi.ntnu.no page, these are well made.
Tim Teitelbaum's slideset is used extensively in the course.
Alex Aikens compiler course on Coursea covers most topics of this course and some more.

Compilers - how do they work?

Compilers are programs which translate structured text from one language to another. This course studies compilers that compile computer programs. A compiler is usually built of several parts, often collectively known as the front-end and the back-end of the compiler. The front-end usually consists of a scanner, a parser, and an intermediate code generator (and optionally optimizer). The front-end also manages the symbol table. The back-end usually consists of a target language code generation, optionally with optimization.

Scanner

A scanner reads the input text as a stream of characters and translates them into a stream of tokens. Tokenizer is another word for scanner. The scanner is a lexical analyzer.

Parser

A parser reads the stream of tokens from the scanner and translates them into an abstract syntax tree using the grammar of the target language as a guide, so to speak. The parser is a syntax/semantic analyzer.

Intermediate code generator

An intermediate code generator generates the abstract syntax tree for a simple intermediate code representation. This is to make things easier for the optimizer in the next step.

Intermediate code optimizer

The intermediate code optimizer optimizes the intermediate code using lots of nifty tricks and techniques that are described in detail later in this compendium.

Target language code generator

The target language code generator finally generates the final code in the target language, based on the optimized intermediate code.

Language classification

Programming languages can be classified into many different classes, based on things such as type system, paradigm, language generation, use areas and so on, but this course seems to focus on classifications based on type systems.

Dynamic vs static typing

A programming language is dynamically typed when most of its type checking is performed at run-time instead of at compile-time. Values have types, but variables do not. Popular dynamic languages include Python, Ruby, PHP, Lisp, JavaScript, Objective-C, Oz (which happens to be the best language ever), and Smalltalk.

A programming language is statically typed if it does most of its type checking at compile-time. Static typing is a form of program verification because it can in its purest form guarantee type safety before the program is run. Popular static languages include Pascal, Ada, C/C++, Java/C#, ML, Fortran.

Dynamic languages are typically more expressive, but static languages are safer (with regard to program verification) and often faster.

A sound type system allow us to statically determine that a type error can occur, and we do not need to check this dynamically.

Strongly typed vs weakly typed

A programming language is strongly typed if it specifies restrictions on which types can be co-operated upon by an operator. A language may for instance allow (int) + (int), but disallow (str) + (int), because str and int are not (for this example) defined to be compatible types for the + operator. Popular strongly typed languages include Pascal, C#/Java, Python, OCaml and Ada.

A programming language is weakly typed if it does not specify restrictions on which types can be co-operated upon by an operator. Weakly typed programming languages often support features like implicit type conversion, ad-hoc polymorphism. Popular weakly typed languages include BASIC, JavaScript, Perl, PHP and C.

Weakly typed languages are typically more expressive, but strongly typed languages are safer (as in program verification).

Grammars

A grammar is a set of production rules for strings in a formal language. Grammars do not specify meaning of strings, or what can be done with them, or anything like that. Grammars only describe the form of the strings.

A grammar $G = (N,\sigma,P,S)$ consists of:

A finite set $N$ of nonterminal symbols
A finite set $\sigma$ of terminal symbols that is disjoint from N.
A finite set $P$ of production rules, each rule on the form: $(\sigma \cup N)* N(\sigma \cup N)* \rightarrow (\sigma \cup N)*$
A symbol $S \in N$ that is the start symbol

The language of a grammar $G$, $L(G)$, is defined as all the sentences that can be derived in a finite number of steps from the start symbol $S$.

Notation

When writing about grammars, the notation convention is to use capital letters for nonterminals, lowercase letters for terminals, $\epsilon$ or $\lambda$ for the empty string, and $S$ for the start symbol.

The Chomsky hierarchy

Grammars are classified into a hierarchy based on the restrictiveness of the production rules. A more restrictive class can express fewer languages than a less restrictive class. The Chomsky hierarchy is, in order or decreasing restrictiveness: $\text{Regular} \subseteq \text{Context-free} \subseteq \text{Context-sensitive} \subseteq \text{Unrestricted}$

Regular grammars

Regular grammars are the most restrictive grammars in the grammar hierarchy. Regular grammars describe a regular language. Regular languages can be expressed using a regular expression.

Context-free grammars

Context-free grammars are grammars where every production rule is of the form $V \rightarrow v$, where $V$ is a non-terminal and $v$ may be a combination of terminals and non-terminals.

CFGs, image screencapped from https://class.coursera.org/compilers-selfservice/lecture/35

Context-sensitive grammars

Context-sensitive grammars are grammars where every production is of the form $\alpha A \beta \rightarrow \alpha \gamma \beta$, where $\alpha$, $\beta$ and $\gamma$ are strings of nonterminals and terminals, and $\gamma$ is not $\epsilon$.

Unrestricted grammars

An unrestricted grammar is a grammar with no restrictions for the grammar productions.

Finite automata

Finite automaton consist of:

A finite set of states, $S$.
An input alphabet: A set of input symbols $\sum$.
Transitions between states.
A state in $S$ that is distinguished as the start state.
A set of final or accepting states

There are two kinds of finite automaton studied in this course: NFA and DFA. Both can be represented by transition graphs. Here is a nifty tool for drawing transition graphs, it even exports to LaTeX. Here is a series of random facts about finite automatons:

Every NFA has an equivalent DFA.
There exists algorithms which can convert NFA to DFA and back.
- NFA to DFA: see 3.7.1 on page 152 in the Dragon Book.
- DFA to NFA: DFAs can be thought of as a special case of NFAs, so you don't have to do anything to convert a DFA to an NFA.
Every finite automaton is equivalent with a regular expression.
finite automatons can be thought of as deciders deciding the question "is this input gramatically correct?" for the language associated with the finite automaton.

Deterministic finite automata (DFA)

DFA is a kind of finite automaton which has exactly one transition for each state for each possible input symbol.

Non-deterministic finite automata (NFA)

NFA is a kind of finite automaton which can have more than one transition per state per input symbol. They do not have to have defined transitions for each state for each symbol, and the empty string ($\epsilon$) is a valid input, allowing for "spontaneous" transitions between states without input. Also a symbol can label several edges out of the same state.

Parsing

Top down parsing

Top down parsers look for left-most derivations of an input-stream by searching for parse trees using a top-down expansion of a given grammar's production rules.

Recursive descent parser

A top-down parser built from a set of mutually recursive procedures where each such procedure usually implements one of the production rules of the grammar.

Predictive parser (LL-parser)

A predicitve parser is a recursive descent parser that does not require backtracking. Predictive parsers only work for LL(k)-grammars.

FIRST and FOLLOW sets

Example of computing FIRST and FOLLOW sets

When constructing a predictive parser, FIRST and FOLLOW sets are nice (read: necessary) to have. They allow us to fill in the entries of a predictive parsing table for a given grammar.

The $\text{FIRST}$ set of a string of symbols $\alpha$ is the set of terminals that begin the strings derived from $\alpha$. $\text{FIRST}(\alpha)$ can be constructed like this for each symbol $X \in \alpha$:

If $X$ is a terminal, $\text{FIRST}(X) = \left\{X\right\}$.
If $X \rightarrow \epsilon$ is a production, add $\epsilon$ to $\text{FIRST}(X)$
If $X \rightarrow Y_1Y_2…Y_k$ then add $\text{FIRST}(Y_1Y_2…Y_k)$ to $\text{FIRST}(X)$
$\text{FIRST}(Y_1Y_2…Y_k)$ is one of:
- $\text{FIRST}(Y_1)$ if $\text{FIRST}(Y_1)$ doesn't contain $\epsilon$.
- $\text{FIRST}(Y_1) + \text{FIRST}(Y_2…Y_k) - \left\{ \epsilon \right\}$

If $\text{FIRST}(Y_i)$ contains $\epsilon$ for all $0 \leq i \leq k \in \mathbb{Z}$, add $\epsilon$ to $\text{FIRST}(X)$

The FOLLOW set of a nonterminal $A$ is the set of terminals a that can appear immediately to the right of $A$. In other words: the set of terminals a such that there exists a derivation of the form $S \rightarrow \alpha A a \beta$. $\text{FOLLOW}(B)$ for all nonterminals $B$ can be computed like this:

Place $ (right endmarker) in $\text{FOLLOW}(S)$, where $S$ is the start symbol.
If there is a production $A \rightarrow \alpha B \beta$, then everything in $\text{FIRST}(\beta) - \left\{\epsilon\right\}$ is placed in $\text{FOLLOW}(B)$
If there is a production $A \rightarrow \alpha B$ or a production $A \rightarrow \alpha B \beta$ where $\text{FIRST}(\beta)$ contains $\epsilon$, everything in $\text{FOLLOW}(A)$ is in $\text{FOLLOW}(B)$.

Algorithm for constructing a predictive parsing table

INPUT: Grammar G.
OUTPUT: Parsing table M.
METHOD: For each production $A \to \alpha$ of the grammar, do the following:

For each terminal a in $\text{FIRST}(\alpha)$, add $A \to \alpha$ to M[A,a].
If $\epsilon$ is in $\text{FIRST}(\alpha)$, then for each terminal b (including $) in $\text{FOLLOW}(A)$, add $A \to \alpha$ to M[A, b].

Bottom-up parsing

Bottom-up parsers start with the input stream and attempt to reach the start symbol through a series of rewrites.

Shift-reduce parsing is a category of commonly used methods for bottom-up parsing. Shift-reduce is based on two basic steps: shift and reduce. Shift advances in the input stream by one token, making the new symbol a new single-node parse tree. Reduce applies a completed grammar rule to one the recent parse trees, joining them together as one tree with a new root symbol.

LR-parser

LR-parsers are left-reading, rightmost-derivating parsers that are O(n) for deterministic context-free languages. LR-parsing is a shift-reduce parsing method. Canonical LR-parsing, though fast, is prohibitively spaceous. Luckily, memory-usage-improved parsers exist, such as the SLR and LALR parsers.

Simple LR-parser (SLR-parser)

An SLR-parser is an LR(1)-parser with a relatively simple parser generator algorithm. SLR lookahead set generators calculate lookahead by an easy approximation method (follow sets) based direcly on the grammar, ignoring everything else.

Look-Ahead LR parser (LALR-parser)

An LALR-parser is an LR(k)-parser (k>0), and a simplified version of a canonical LR-parser. LALR lookahead set generators calculate lookahead by a more precise method based on exploring the graph of parser states and their transitions.

Handling ambiguity

Operator precedence and associativity

Operator precedence: calculations with higher precedence is done first. Associativity: left or right, whether to parse left-to-right or right-to-left with operators of the same precedence.

Example of associativity

An operator associates to the left if an operand with the given operator on both sides "belongs" to the left one. E.g. in the expression $a+b+c$, the operand $b$ is surrounded by $+$ operators. $+$ is left associative (by mathemagical convention, deal with it) so the expression $a+b$ takes precedence over $b+c$ and is evaluated first. Because of the left associativity of $+$ we could rewrite the expression to $(a+b)+c$. If $+$ was right-associative we'd rewrite it to $a+(b+c)$.

What if there are two different operators? Then the one with higher precedence wins. For instance: both mulitplication $*$ and addition are left-associative, but multiplication has higher precedence. So the expression $a+b*c$ would be rewritten as $a+(b*c)$.

If both operators have the same level of precedence, regular associativity applies.

In mathematics, addition and subtraction are both left-associative and have the same precedence. Associativity is important because $(a-b)+c$ and $a-(b+c)$ are not equivalent expressions. Left-associativty would ensure that $a-b+c$ is parsed as $(a-b)+c$.

The dangling else problem

Give a grammar with the non-terminals $S$ and $E$ and the productions $S \rightarrow \text{ if } E \text{ then } S$, $S \rightarrow\text{ if } E\text{ then } S \text{ else } S$, we may end up with a dangling else we don't know whether to shift or reduce with the simple string "if e1 then if e2 then s1 else s2".

Let's have a look at the parsing table:

Parse stack	Lookahead symbol	Unscanned	Action
empty	if	e1 then if e2 then s1 else s2	Shift
if	e1	then if e2 then s1 else s2	Shift
if e1	then	if e2 then s1 else s2	Shift
if e1 then	if	e2 then s1 else s2	Shift
if e1 then if	e2	then s1 else s2	Shift
if e1 then if e2	then	s1 else s2	Shift
if e1 then if e2 then	s1	else s2	Shift
if e1 then if e2 then s1	else	s2	Shift ("else" belongs to the innermost "if") or reduce innermost "if" to S?

Symbol tables

Symbol tables are data structures that store information about source-program constructs. Entries in the table contain information about all identifiers, such as its string (or lexeme), type, position in storage, etc. Each scope has its own symbol table, but the current scope "inherits" the contents of parent scopes. That is, any identifier in this scope is available in nested scopes. To create symbol tables, hash tables are usually used, where KEY: symbol, VALUE: information about the symbol. Usually, the parser creates the symbol table and populates it, as only with its knowledge of the semantics can it decide to create a new entry, or use a previously made one.

Optimizations

When compiling code, a compiler should generally try to optimize its output as much as possible. When a compiler optimizes its output, it is called an optimizing compiler. Optimization in the context refers to the processing of minimizing some attributes of the output program, most commonly execution time, but also memory footprint, without changing the semantics of the program. Optimization is typically computationally intense, with some problems being NP-complete or even undecidable.

Optimizations and when to apply them

Copied from slide number 27 of "Lecture 23: Introduction to Optimizations", CS412/413 Spring 2008 Slide Set by Tim Teitelbaum.

High IR

High IR is the intermediate code representation closest to the source language, typically high level.

Function Inlining

Replace calls to (simple) functions with the function bodies themselves, avoiding overhead associated with function calls.

Function cloning

Replace function calls where certain arguments are constants with calls to a cloned function where the given argument(s) are replaced with the constant. There might be much to gain if you can simplify the function.

Example:

int foo(int a, int b) {
    for (int i = 0; i < b; b++) {
        a += i;
    }
    return a * 5;
}

If foo is called with b=0, we may instead call a function foo0(int a) which simply returns a*5.

Low IR

Low IR is the intermediate code representation closest to the target language, typically low level.

Constant folding

Evaluate constant expressions and change them with their value (change x = 3 * 2 to x = 6).

Constant propagation

If we know that x has the value 6, we may change subsequent reads of x with the value 6 until x is overwritten.

Copy propagation

To replace the occurrence of direct assignments with their values, e. g.

y = x

z = 3 + y

z = 3 + x

Value numbering

Give a value to each encountered variables and expression, where if two expressions evaluate to the same value, they should hold the same value in order to help in removing redundant code. (Cleanup of that sentence needed).

Dead code elimination

Eliminate code that never changes the program result, both code that has no effect and code that can't possibly be reached. An example of code that has no effect is if you assign a variable with two different values without using the variable inbetween.

Loop-invariant code motion

Loop-invariant code is code that can be moved to outside of a loop without altering the semantics of a program. Say you have a loop which calculates a variable z = x * y inside a loop, but neither x or y is changed inside the loop, the calculation may be moved to before the loop.

Common sub-expression elimination

Wikipedia says it best: "In the expression (a + b) - (a + b)/4, "common subexpression" refers to the duplicated (a + b). Compilers implementing this technique realize that (a + b) won't change, and as such, only calculate its value once."

Strength reduction

Exchange "complicated" computations with "simpler". Example: instead of i*2, do i+i.

Branch prediction/optimization

TODO: put words here.

Loop unrolling

If a loop is executed a constant number of times, you may remove the loop altogether and just copy the function body that many times, avoid loop overhead with branching etc. Might dramatically increase the generated code size. Another way of doing it is not completely unrolling the code, but doing multiple executions of the body between every branch-check.

Assembly

Maximal munch

When generating machine code from the Low IR tree, we want the size of the produced code to be as small as possible. Maximal munch basically tries to minimize the generated code size by choosing instructions that represents as many nodes as possible.

Register allocation

Allocate registers in such a way that you need to go to other memory as little as possible.

Cache optimization

Data-flow analysis

Data-flow analysis-based optimizations are optimizations based on data-flow analysis. A data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a program.

Control flow graphs (CFG)

A control flow graph represents all code paths that might be traversed through a program during its execution. Each node in the graph represents a basic block. A basic block is a continuous piece of code without jumps or jump targets. Possible jumps are represented as directed edges between nodes. Additionally, one block is marked as an entry block where execution starts, and one block is marked as an exit block, where execution ends.

Reachability

Subgraphs that are unreachable from the entry node can safely be discarded from the graph, as they can never be executed. This is an important technique for dead code elimination.

If the exit block is unreachable, the program has an infinite loop. Although not all infinite loops can be detected, it is still nice to detect some of them!

Reaching definitions

Calculating reaching definitions is a forward-flow dataflow problem. Here's how to solve one.

$$ OUT[B] = transfer(B)(IN[B]) $$

$$ IN[B] = \bigcup_{p \in pred[B]} OUT[p] $$

Where

GEN:	the set of all definitions inside the block that are 'visible' immediately after the block (known as downwards exposed).
KILL:	the union of all definitions killed (overwritten) by the individual statements.

$$ TransferB(IN[B]) = GEN[B] \cup (IN[B] - KILL[B]) $$

Algorithm:

Set all OUTs to Ø
while any changes occur to ay OUT set:
    for each block:
        Update IN[B]
        Update OUT[B]

profit!

Transfer functions

If a variable $a$ has value $v$ before executing the statement $b = a$, both a and b have value v after the statement. This relationship between the data-flow values before and after the assignment statement is known as a transfer function.

Live variable analysis

A variable in a program is live at a point of the program if it holds a value that may be needed in the future. Read up on lattices, because they are useful when doing live variable analysis.

A use-define chain (UD chain) is a data structure that is used in liveness analysis. UD chains are data structures that consist of a use of a variable (U) and all the definitions of said variable that can reach that use without any other intervening definitions (D). Usually, a definition means an assignment of a value to a variable.

A UD chain makes it possible to lookup for each usage of a variable, which earlier assignments its current value depends on. This facilitates splitting variables into several different variables when variables have a definition chain of length 1. This is very useful for putting things into static single assignment form.

Partial Order

A partial order is a relation over elements in a set. For the relation to be a partial order it has to be antisymmetric, transitive and reflexive. The greater than or equal symbol "≤" is an example of a partial order. For a relation to be a partial order, the following conditions must hold for all a, b and c in the set P.

a ≤ a (reflexivity) for all a in P
if a ≤ b and b ≤ a then a = b (antisymmetry)
if a ≤ b and b ≤ c then a ≤ c (transitivity)

It is easy to see that the standard "≤" relation on real numbers fulfill these conditions and is thus a partial order. Another example of a partial ordering is the subset operator "⊆".

Dominator trees

Lemme explain: In a CFG, a node d dominates a node n if every path from the start node to n must go through d. We write: d dom n, or d >> n. A dominator tree is tree where each node's chidren are those nodes it immediately dominates. The start node is the root of the tree. Dominator trees are used for computing static single assigment form for an IR - that is that every variable is assigned exactly once.

Interference graphs

Interference graphs are used for register allocation. An interference graph is a graph where nodes represent variables in a program, and edges between them represent interferences between them. If two variables do not interfere, the same register can be used for both of them, reducing the number of registers needed.

Syntax directed translations

Syntax directed translations refers to a method of compiler implementation where the source language translation is completely driven by the parser. Every syntax rule can have actions that decides attributes for the nodes. For example:

int -> digit {int.value = digit.value}

There are two types of attributes, synthesized and inherited. Synthesized attributes are passed up the parse tree, that is left sides attributes is computed from the right side. Inherited attributes are passed down the parse tree or to siblings. That is the right side attributes are computed from the left side or other attributes on the right side.

We have to classes of of SDDs that can be parsed. The thing with SDDs is that we can not have any cycles, or else there will be no logical way to parse the SDD.

An SDD is S-attributed if every attribute is synthesized. A S-attributed SDD can be evaluated in any buttom up order.

An SDD is L-attributed if every attribute is either synthesized or inherited but follow some rules. In the production. $ A \rightarrow X\_1X\_2...X\_n $

A rule $X_i$ may only use rules that get the value from a inherited attribute associated with the head A or inherited or synthesizeid values associated with the occurences of symbols $X_1, X_2, ... , X_{i-1}$ located to the left of $X_i$. It can also use rules that get values from inherited or sunthezised attributes associated with this occurence of $X_i$ itself, but only in a way that there are no cycles in a dependency graph formed by the attributes of this $X_i$.

Nice tips for the final

The difference between DFAs and NFAs
Scope of identifiers in object-oriented class systems. See slide number 12 of "Lecture 12: Symbol Tables", CS412/413 Spring 2008 Slide Set by Tim Teitelbaum.
Classifying different well-known programming languages as strong/weakly typed, static/dynamic typing (" given these languages, where would you put them? Ex.: fortran, cobol, matlab, python etc")
Function call on objects, dynamic dispatch vector, see slide number 6 of "Lecture 22: Implementing Objects", CS412/413 Spring 2008 Slide Set by Tim Teitelbaum.
When to implement what optimization, see slide number 27 of "Lecture 23: Introduction to Optimizations", CS412/413 Spring 2008 Slide Set by Tim Teitelbaum.
FP ⊑ MFP ⊑ MOP ⊑ IDEAL, viktig i komptek; lecture 27, slide 22; også slide 15 i neste sett
multiple choice tip for compilers: is c weakly typed? yes.

Written by

Stian Jensen sigveseb cristea emiltayl paal pbsds runholm odd harahu Raane sindreij Skarding Lionleaf gombos

Last updated: Tue, 4 Jun 2019 19:49:32 +0200 .