TDT4125: Algorithm Construction, Advanced Course (20132014)
Practical information
Note: sections marked with "(2013)" have not been updated to reflect any possible changes in the curriculum between 2013 and 2014.
 The exam is held on the 24
$^{\text{th}}$ of May, 2014 at 0900.  All printed or handwritten materials may be used during the exam (including the textbook, and even this compendium!). Citizen SR270X is also permitted.
 Old exams
Curriculum / reading list
This year (2014), the curriculum is chapters 1 through 8 in The Design of Approximation Algorithms, by David P. Williamson and David B. Shmoys.
The topics covered are:
 An introduction to Approximation Algorithms
 Greedy Algorithms and Local Search
 Rounding Data and Dynamic Programming
 Deterministic Rounding of Linear Programs
 Random Sampling and Randomized Rounding of Linear Programs
 Randomized Rounding of Semidefinite Programs
 The PrimalDual Method
 Cuts and Metrics
Terminology
Problems and solutions
Feasible:
A feasible solution to a problem is legal with regards to the requirements of the problem or the problem's environment. A feasible solution to the knapsack problem could be any subset of items whose total weight does not exceed the maximum allowed weight.
Graphs
Assume we have a graph
$V$ is the set of the graph's vertices/nodes.$V$ is the number of vertices. The degree of a vertex is the number of edges directly connected to it/with it as one of its endpoints.
$E$ is the set of the graph's edges/arcs.$E$ is the number of edges. An edge can be defined on the form
$(i, j)$ , where$i$ and$j$ are vertices of the graph. I.e.$(i, j) \in E$ ,$i \in V$ ,$j \in V$ .  An edge
$e$ is said to be incident on a vertex$v$ if one of its endpoints is$v$ .
 A path between two nodes is a series of edges that lead from one node to another. A single edge is a path. "
$(i, j), (j, k)$ " is a path.
A graph
A directed graph
Two vertices
A graph can be described as metric if for any vertices in the graph, the length of any path of size
A graph is Eulerian if it contains an eulerian path. An eulerian path is a path that visits each edge in the graph exactly once. A graph is eulerian if and only if it is strongly connected and each node has an even degree.
A perfect matching of a set of nodes is a collection of edges that contain each node in the set exactly once.
Suppose that
A connected component of an undirected graph
Wicked mathemagics
This section details some of the mathematical "tricks" used in TDoAA to prove stuff that you might not remember from whatever part of elementary or high school we were supposed to learn it. TDoAA spends a whole bunch of time proving the performance guarantee of some given algorithms. This usually involves stipulating stuff about the optimal solution for a problem and then using some black voodoo magic to show that there exists an upper bound of the cost of the solution produced by a given algorithm, typically at some factor of the cost of the optimal solution. Unless you're a real whiz at mathemagics the simple arithmetic stuff might confound you more than the assumptions that are made about the algorithms or problems themselves. This section hopes to alleviate some of that confusion.
Inequalities
Sums
If
It also implies that there is some
Say you have some sum
"In what scenario would this ever be useful", one wonders.
Oh, who knows, really.
Like, if we knew that
Complexity Classes
The complexity classes include (but are not limited to): P, NP, NPComplete and NPHard.
These four classes contain decision problems, although not all NPHard problems are decision problems.
Note that all problems in P are also in NP.
A decision problem is a problem, or question, for which the answer is either "yes" or "no."
Finding the shortest tour through a weighted graph is not a decision problem, but "does there exist a tour with total cost less than
P
P contains those decision problems that can be solved in polynomial time. Given an instance of a problem in P, it is theoretically possible to produce a solution in polynomial time. However, a problem being in P does not imply that a concrete algorithm is known that solves it in polynomial time, only that such an algorithm exists. See Polynomialrecognition of the RobertsonSeymour theorem on Wikipedia for an example of this.
All problems in P are also in NP.
NP (nondeterministic polynomial time)
The class NP contains those decision problems for which the instances where the answer is "yes" can be verified in polynomial time.
The decision variant of the knapsack problem is in NP.
It asks if there exists some combination of items whose total value is greater than or equal to
NPComplete
A problem
A polynomialtime reduction from
We can reduce an instance of the Hamiltonian cycle decision problem ("Does this undirected graph
NPHard
"NPhard problems are at least as hard as the hardest problems in NP."
A problem
The optimization version of the knapsack problem is NPhard: Given an instance of the decision version of the knapsack problem, we just check if the value of the optimal solution is at least the value which the decision version asks about.
Linear programming (LP)
"LP" can refer to both linear programming and a linear program.
A linear program is formulated in terms of some number of decision variables that represent some decision that needs to be made. The variables are constrained by linear inequalities and equalities. Any assignment of real numbers to the variables such that all of the constraints are satisfied is a feasible solution. A typical linear program also has an objective function whose value is dictated by the decision variables, and the purpose is to find a feasible solution that maximizes or minimizes the value of the objective function.
Linear programs can be solved in polynomial time.
An LP can be presented in the form:
Where
An LP formulation of the knapsack problem:
Where
Would it make more sense if we restricted
Integer Program (IP)
Synonyms: Integer Linear Program (ILP)
An integer program is like a linear program except that the decision variables are restricted to being integers rather than real numbers.
Integer programs are NPhard.
01 Integer Linear Program (01ILP)
Synonyms: Binary LP, 01LP, Binary ILP.
An integer program is like a linear program except that the decision variables are restricted to being either 0 or 1.
In the LP formulation of the knapsack problem above we would have to replace the restraint
Formal languages (2013)
Alphabet
Any nonempty, finite set is called an alphabet. Every element of an alphabet
$\Sigma$ is called a symbol of$\Sigma$ .
Example alphabets:
$\Sigma_{\text{bool}} = \left\{0, 1\right\}$ $\Sigma_{\text{latin}} = \left\{a, b, c, d, e , ... , z \right\}$ $\Sigma_{\text{logic}} = \left\{0, 1, (,), \wedge, \vee, \neg, x \right\}$
Word
Let
$\Sigma$ be an alphabet. A word over$\Sigma$ is any finite sequence of symbols of$\Sigma$ . The empty word$\Lambda$ is the only word consisting of zero symbols. The set of all words over the alphabet$\Sigma$ is denoted by$\Sigma^*$ .
Language
Let
$\text{Time}_A(x)$ and $\text{Space}_A(x)$
Let
$\Sigma_{\text{input}}$ and$\Sigma_{\text{output}}$ be alphabets. Let$A$ be an algorithm that realizes a mapping from$\Sigma_{\text{input}}^*$ to$\Sigma_{\text{output}}^*$ . For every$x \in\Sigma_{\text{input}}^*$ ,$\text{Time}_A(x)$ denotes the time complexity of the computation$A$ on the input$x$ , and$\text{Space}_A(x)$ denotes the space complexity of the computation$A$ on$x$ .
Time complexity of an algorithm
Let
$\Sigma_{\text{input}}$ and$\Sigma_{\text{output}}$ be two alphabets. Let$A$ be an algorithm that computes a mapping from$\Sigma_{\text{input}}^*$ to$\Sigma_{\text{output}}^*$ . The worst case time complexity of$A$ is a function$\text{Time}_A: (N  \left\{ {0} \right\} ) \to N$ defined by$\text{Time}_A(n) = \text{max}\left\{\text{Time}_A(x)  x \in \Sigma_{\text{input_n}}\right\}$ for every positive integer$n$ .
Complexity classes (2013)
Time
Complexity class  Model of computation  Resource constraint 
DTIME( 
Deterministic Turing machine  Time 
P  Deterministic Turing machine  Time 
EXPTIME  Deterministic Turing machine  Time 
NTIME( 
Nondeterministic Turing machine  Time 
NP  Nondeterministic Turing machine  Time 
CoNP  TBA  TBA 
NEXPTIME  Nondeterministic Turing machine  Time 
Space
Complexity class  Model of computation  Resource constraint 
DSPACE( 
Deterministic Turing machine  Space 
L  Deterministic Turing machine  Space 
PSPACE  Deterministic Turing machine  Space 
EXPSPACE  Deterministic Turing machine  Space 
NSPACE( 
Nondeterministic Turing machine  Space 
NL  Nondeterministic Turing machine  Space 
NPSPACE  Nondeterministic Turing machine  Space 
NEXPSPACE  Nondeterministic Turing machine  Space 
Cost measurement (2013)
Time efficiency estimates depends on what is defined to be a single step, which takes a single time unit to execute. Two cost models are generally used:
Uniformcost measurement
Every machine operation is assigned a single constant time cost. This means that an addition between two integers, as an example, is assumed to take the same amount of time regardless of the size of the integers. This is often the case in practical systems for sensibly sized integers. This is the cost measurement used throughout Hromkovič.
Logarithmiccost measurement
Cost is assigned to the number of bits involved in each operation. This is more cumbersome to use, and is therefore usually only applied when necessary.
Decision problems (2013)
A decision problem is a question with a yesorno answer, depending on the input parameters. Formally, it is represented as a triple
$A(x)$ =$1 \text{if} x \in L$ $A(x)$ =$0 \text{if} x \in U  L$
A decision problem is equivalent to a language.
Famous NPcomplete problems
This is a list of important or otherwise famous NPcomplete problems, which are probably nice to know by heart, as they assumed to be known, and can therefore be used freely in proofs.
Hamiltonian cycle problem (HC)
The Hamiltonian cycle problem is the problem of determining whether a given graph contains a Hamiltonian cycle.
Satisfiability problem (SAT)
The boolean satisfiability problem is the problem of determining whether a given boolean expression in CNF can be satisfied. An expression in CNF is an expression of the form:
Clique problem (CLIQUE)
The clique problem is the problem of determining whether there exists a clique of size
Vertex cover problem (VCP)
The vertex problem is the problem of determining whether a given graph
VCP can be approximated by Maximum Matching by a factor
Set cover problem (SCP)
The decision version of the set cover problem asks whether, given a set of elements
SCP can be approximated greedily by repeatedly adding the set that contains the largest number of uncovered elements to the set cover. This has an approximation factor of
SCP can be represented as an IP:
Minimize sum
subject to
sum
Travelling salesman problem (TSP)
The travelling salesman problem is the problem of finding the the shortest Hamiltonian cycle in a complete graph. While the optimization version of TSP is NPhard, the decision version of TSP is NPcomplete.
TSP, with the restriction that the triangle inequality holds, can be approximated by Christofides algorithm with a ratio
PseudoPolynomial Time Algorithms (2013)
A numeric algorithm runs in pseudopolynomial time if its running time is polynomial in the numeric value of the input. NPcomplete problems with pseudopolynomial time algorithms are called weakly NPcomplete. NPcomplete problems that are proven to not have a pseudopolynomial time algorithm are called strongly NPcomplete. Strong and weak kinds of NPhardness are defined analogously.
Formally:
Let
$U$ be an integervalued problem, and let$A$ be an algorithm that solves$U$ .$A$ is a pseudopolynomialtime algorithm for$U$ if there exists a polynomial$p$ of two variables such that$\text{Time}_A(x) = O(p(x, \text{MaxInt}(x)))$ for every instance$x \in U$ .
Example: Primality test
Consider the decision problem of whether a number
Parametrized complexity (2013)
Fixedparameter polynomial time algorithms
A parametrized problem is a language
Example: SAT
SAT can be parametrised by the number of variables. A given boolean expression of size
Approximation algorithms (2013)
Approximation algorithms are algorithms that calculate an approximate answer to an optimization problem. Informally, approximation algorithms are said to have an approximation ratio of
Polynomialtime approximation scheme (PTAS)
A polynomialtime approximation algorithm,
Fully polynomialtime approximation scheme (FPTAS)
An PTAS is FPTAS is it is polynomial in
Local Search (2013)
Local search is a metaheuristic method for optimization problems. Local search moves from solution to solution in the search space by applying local changes until an sufficiently optimal solution is found, or until a time bound is elapsed.
Straight up regular local search
For your basic local search you need the following:
 A neighbor graph showing the relation between neighboring solution candidates
 A fitness function evaluating candidate solutions
 Good luck
Straight up regular local search will quickly find a local optimum from where the search is started, but does not guarantee finding a global maximum, nor does it know if a global maximum has been found.
There are several ways of selecting a neighbor to follow when doing a local search, two of which are first improvement and steepest descent. Steepest descent evaluates all neighbors, and selects the best neighbor. First improvement selects the first neighbor which is better than the current solution candidate. First improvement is often faster to calculate, but Steepest descent usually converges faster to a local optimum.
Variabledepth search
Variable depth search provides a nice compromise between first improvement and steepest descent. In variable depth search, changed variables in a solution are locked, preventing their further modification in that branch. KernighanLin is the canonical example of variabledepth search.
Simulated annealing
Simulated annealing introduces probability and randomness to which neighbor is selected. This reduces the chance of getting stuck in a nonglobal local maximum.
Tabu search
Tabu search marks certain solutions as tabu after processing, and never revisits them. This is often done on an LRU or LFU basis. A common implementation is a search which remembers the last
Tabu search prevents a candidate solution to be considered repeatedly, getting the search stuck in plataus.
Randomized Tabu search
Like regular Tabu, but randomized (or so the name seems to suggest).
Intensification vs Diversification
Intensification is the intense search of a relatively small area  that is, the exploitation of a discovery of a good area. Diversification is looking at many diverse regions, exploring uncharted territories. Intensification is often quicker at reaching a local optimum, but diversification is better geared towards discovering global optimums.
Branchandbound (BB) (2013)
A branchandbound algorithm consists of systematic enumeration of all candidate solutions, where large subsets of fruitless candidates are discarded en masse, by using upper and lower estimated bounds of the quantity being optimized.
Canonically, BB is used to minimize a function
 For branching: a splitting procedure that splits a candidate solution set
$S$ into$S_1, S_2, ..., S_n$ such that$S = \bigcup \left\{S_1, S_2, ..., S_n \right\}, n ≥ 2$ , and the minimum of$f(x)$ over$S$ is$\text{min}\left\{v_1, v_2, ..., v_n\right\}$ , where each$v_i$ is the minimum of$f(x)$ within$S_i$ .  For bounding: a procedure that computes upper and lower bounds for the minimum value of
$f(x)$ over a given subset of$S$ .
For an example of branch and bound used with Integer Programming, see Algorithm Animations for Practical Optimization: A gentle Introduction
Genetic algorithms (2013)
Genetic algorithms mimic evolution to obtain practical acceptable solutions to optimization problems. See for an overview of how to make a genetic algorithm.
Some geneticdomainspecific words need a mapping into the algorithmics domain:
Genetics term  Corresponding algorithmics term 
an individual  a string (or vector) representation of a candidate solution 
fitness value  cost function 
population  a subset of the set of feasible candidate solutions 
mutation  a random local transformation of an individual 
Here, the different stages of the genetic algorithm as described in the template are explained in more detail:
Initialization
Initialization is the step of creating a starting population
Selection
Selection is the step of selecting
Genetic operators
Genetic operators work on individuals to make new, different individuals.
Crossover/recombination
Crossover is when you combine two individuals to make an offspring individual which contains combined parts from both its parents. One way of performing a crossover is to take use the first half of the first individual (a string or vector representation of a candidate solution, remember), and and the second half of the second individual.
As a simple example: consider a genetic algorithm trying to find a string of length 10 that contains only vowels. The two individuals lsjifasdfw and areighifpo are recombined using the previously described method would make a new individual lsjifhifpo
Mutation
Mutation does a slight modification of a single individual to form a new individual. Methods include flipping a single bit of the individual, adding a random number to one of the individual's vector elements, generationnumberdependant modifications, etc.
Following with the offspring individual from the recombination example: lsjifhifpo could mutate to lsjifaifpo, using the mutation rule that a single character in the string should be swapped with a random character.
(regrouping)
(colonizationextinction)
(migration)
Termination
Linear programming (LP) (2013)
Also known as linear optimization. LP is the process of finding the "best" values obtainable for a function of variables constrained by linear inequalities. LP is solvable in polynomial time.
Integer programming (IP)
IP is linear programming where the variables can only take integer values. IP is NPhard.
Binary linear programming (01LP)
01LP is linear programming where the variables can only take 0 or 1 as values. 01LP is NPhard.
Relaxation to LP
Many interesting optimizaition problems with useful solutions are easily reduced to IP or 01LP. Because IP and 01LP are both NPhard, and LP is P, relaxing problems to LP is a good idea when possible. Here is how:
 Express an optimization problem
$U$ as an instance$I$ of IP or 01LP.  Pretend that
$I$ is an instance of LP, and find a solution$a$ .  Use
$a$ to solve the original problem.
Easy, right?
Layman's guides (2013)
This section contains some useful guides to various methods and techniques for algorithm design.
Parametrized complexity applications
So you have an NPhard problem. Nice. And difficult. So difficult that it is impossible to solve in polynomial time (unless
 Define a subproblem mechanism for defining subproblems according to their supposed subproblem difficulty.
 Use this mechanism to define a class of easy subproblems.
 Define a requirement for an algorithm to have a nice time complexity with respect to the subproblem difficulty, in such a way that a nice algorthm can solve an easy problem in polynomial time.
 Then, either:
 design a nice algorithm, and prove that is nice, or
 prove that it is impossible to design a nice algorithm, unless
$P = NP$ .
Look at these categories for ideas for the different subproblem mechanisms, proofs, etc:
Pseudopolynomialtime algorithms
Subproblem mechanism  Include only instances where the maximal integer in the input is bounded by a nondecreasing function 
Easy subproblems  Those where he bouding 
Nice algorithms  Pseudopolynomialtime algorithms. 
Designing the algorithm  Just design it 
Proving that there are no nice algorithms  Prove that there exists a polynomial 
Parametrized complexity
Subproblem mechanism  Define a parametrization function Par to give each input instance an integer score to measure the difficulty of the input instance. Higher Par(x) means harder difficulty. Subproblems can be defined by including only instance where Par(x) takes a specific value or range. 
Easy subproblems  Those where Par(x) is small and does not depend on input size. 
Nice algorithms  Parparametrized polynomialtime algorithms. 
Designing the algorithm  Just design it 
Proving that there are no nice algorithms  choose a constant k and prove that subproblem where Par(x) = k is NPhard. 
Local search algorithm design

Make sure you have an optimization problem and not a decision problem.

Define a neighborhood function. This function maps for an input instance x each feasible solution a for x to other feasible solutions for x called the neighbors of a. Typically, neighboring solutions differ only by some small modification.

Start the search anywhere you want. Randomly choosing some place is a nice strategy.

Repeat the following: go to best neighbor (or don't, if all neighbors suck, comparitively).

In the case where all neighbors suck, give up. You have a local optimum and the search is done.
Genetic algorithm template
 Randomly generate a population of individuals.
 Apply a fitness function to calculate a fitness score for each individual in the current generation.
 Select individuals for reproduction based on fitness and a little randomness.
 Apply crossover and mutation to the selected individuals to produce the next generation.
 Stop whenever you feel like it.
Proving that a decision problem is undecidable
You have a decision problem E which you suspect is undecidable. Unfortunately, you need proof. A general method is: 1. Find a different decision problem H which is known to be undecidable. I recommend the halting problem. 2. Suppose a decider R decides E. Define a decider S that decides H using R. 3. If R exists, S can solve H. However H cannot be solved. Therefore, R cannot exist.
The PrimalDual schema
(taken from cmu.edu)
We typically devise algorithms for minimization problems in the following way:
 Write down an LP relaxation of the problem, and find its dual. Try to find some intuitive meaning for the dual variables.

Start with vectors x = 0, y = 0, which will be dual feasible, but primal infeasible.

Until the primal is feasible:
 increase the dual values
$y_i$ in some controlled fashion until some dual constraint(s) goes tight (i.e. until$sum_i y_i * a_ij = c_j$ for some$j$ ), while always maintaining the deal feasibility of y.  Select some subset of the tight dual constraints, and increase the primal variable corresponding to them by an integral amount.
 increase the dual values

For the analysis, prove that the output pair of vectors (x,y) satisfies
$c^{[trans]} * x <= \rho * y^{[trans]} * b$ for as small a value of rho as possible. Keep this goal in mind when deciding how to raise the dual and primal variables.
Christofides' algorithm
Use this to approximate the TSP
 Create the MST
$T$ of$G$ .  Let
$O$ be the set of vertices with odd degree in$T$ and find a minimal matching$M$ in the complete graph over$O$ .  Combine the edges of
$M$ and$T$ to form a multigraph$H$ .  Form an eulerian circuit in
$H$ .  Make the circuit found in previous step Hamiltonian by skipping visited nodes ('shortcutting').
You now have an approximation of TSP with
Converting problems
VCP to SCP
This is how to turn a VCP into an SCP. We use the variables_G_ = (V,E) for the VCP graph, and U and S for the SCP variables.
 Let U be E.
 Let
$S_i$ be the set of edges touching vertex i.
And that's it!
Appendix: list of algorithms featured in the textbook (2013)
Note: From the 2013 textbook (Algorithmics for Hard Problems, by Hromkovi
 156, DPKP
 165, FordFulkerson
 172, Vertex cover algorithm
 177, B&B for MAXSAT and TSP
 187, D&C3SAT
 191, Local search scheme
 197, KerniganLin variabledepth search
 434, Metropolis algorithm
 436, Simulated Annealing
 442, Tabu search
 443, Randomized tabu search
 446, Genetic algorithm scheme
 214, Methods for relaxing problems to LP
 226, Simplex
 228, SCP(k) relaxation to LP
 237, Primaldual scheme
 238, Primaldual (MINVCP)
 250, Greedy makespan schedule
 262, Algorithm for VCP
 264, Algorithm for SCP
 268, Algorithm for WEIGHTVCP
 269, Algorithm for MAXCUT
 272, Greedysimple KP
 273, PTAS for SKP
 278, modified PTAS for SKP
 280, FPTAS for KP
 283, TSP △ineq 2approx
 288, Christofides algorithm
 301, Sekanina's algorithm
Appendix: proofs (2013)
(TODO: write the actual proofs)
 201, HC
$\le_p$ RHC  201, RHC
$\le_p$ SUBOPT_TSP
Curriculum / reading list (previous semesters)
2013, spring
This year (2013), the curriculum is chapters 3, 4 and 6, except 4.3.64.5, of Hromkovi
The topics covered are:
 Deterministic approaches
 Pseudopolynomialtime algorithms
 Parametrized complexity
 Branch and bound
 Lowering worst case complexity of exponential algorithms
 Local serach
 Relaxation to linear programming
 Approximation algorithms
 Fundamentals: Stability, Dual approximation etc
 Algorithm design: lots of approximations for known hard problems
 Heuristics
 Simulated annealing
 Tabu search
 Genetic algorithms