Wikipendium

Share on Twitter Create compendium Add Language
Edit History
Tools
  • Edit
  • History
  • Share on Twitter

  • Add language

  • Create new compendium
Log in
Table of Contents
  1. What are algorithms?
  2. Basic Data structures
    1. Linked lists
      1. Runtimes
    2. Abstract Data structures
      1. Queue
      2. Stack
      3. Heap
      4. Hash
        1. Hash functions
        2. Runtimes
  3. Runtime Calculations
    1. About Runtimes
    2. Common Runtimes
  4. Recursion
    1. The Master Theorem
      1. Example
  5. Sorting and Search
    1. Search
      1. Brute force
      2. Binary search
    2. Sorting
      1. Stability
      2. Comparison Based Sorting Algorithms
        1. Merge sort
        2. Quicksort
        3. Bubblesort
        4. Insertion Sort
        5. Selection sort
      3. Other Sorting Algorithms
        1. Heapsort
        2. Counting sort
        3. Radix sort
        4. Bucket sort
        5. Topological sort
  6. Graphs and Graph Algorithms
    1. Representation
      1. Neighbor Lists
      2. Neighbor Matrices
    2. Traversal
      1. Breadth First Search (BFS)
      2. Depth First Search (DFS)
      3. Traversing order
    3. Minimal Spanning Trees
      1. Kruskal
      2. Prim
    4. Shortest Path
      1. One-to-All
        1. Relax
        2. Dijkstra's algorithm
        3. Bellman-Ford
        4. DAG shortest path
      2. All to All
        1. Johnson's algorithm
        2. Floyd-Warshall
    5. Max Flow
      1. Flow Network
      2. Residual Network
      3. Augumenting Path
      4. Minimal cut
        1. Ford-Fulkersons method
        2. Edmonds-Karp
      5. Whole Number Theorem
  7. Dynamic Programming
    1. Longest common substring
    2. Rod-cutting
  8. Greedy Algorithms
    1. Planning activities
    2. Huffman's algorithm
  9. Multithreading
    1. The basics of Multithreading
      1. Concurrency keywords
      2. Performance Measures
  10. Problem Complexity
    1. P, NP, NPC
      1. Reducibility-relation
      2. Some known NPC problems
  11. Runtimes for curriculum algorithms
    1. Sorting and selection
    2. Heap-operations
    3. Graph-operations
    4. Other algorithms
‹

ABC123: Personal version of algdat

Tags:
+

Are you looking for an overview of all runtimes, you will find them all at the bottom, Runtimes for Curriculum Algorithms.

What are algorithms?

An informal definition: An algorithm is any clearly defined method for calculations that can take a value (or a set of values) as input and return a value (or a set of values) as output.

One can also view it as a tool that can solve a defined calculation problem. When defining the problem, one has to describe which relationship is desired between input and output, for example: «Input: A city road map and two points. Output: The shortest path (measured in meters) between the two points.»

Description of solution: There is no general standard for describing an algorithm. You can describe it with natural language, pseudo code, as code or through hardware drawings. The only requirement is that the description is precise.

Instances: Each collection of input-values to a problem is called an instance. For example can an instance of the problem above have input values such as a road map of Trondheim, and the two points are the geographic coordinates of NTNU Gløshaugen og NTNU Dragvoll.

True or False?: An algorithm can either be true or false. If an algorthim is true, it will for all possible instances defined give the correct output; for example it is only to be expected that a sorting problem will not have problems sorting all possible collections of positibe whole numbers. If it is false, this will not be the case. F.ex: you may have invented an algorithm that solves all sorting problems by returning the list in reverse. This will only in some instances return the correct output, but only for a subset of potential instances.

Basic Data structures

This course covers several different data structures with respect to various problems. It is therefore important to have a general overview of the most important ones.

Linked lists

Linked lists

Singly linked list (Public Domain, Lindisi)

A linked list is a basic linear structure that represents elements in sequence. The concept behind the structure is that the sequence of elements is conserved by each element pointing to the next in the sequence (se the figure above). Explained in code:

class Node:
    def __init__(self):
        self.value = None
        self.next = None

n1 = Node()
n2 = Node()
n3 = Node()
n1.value = 1
n2.value = 2
n3.value = 3
n1.next = n2
n2.next = n3

With the code above the structure will be n1 -> n2 -> n3

We can also have double linked lists, where each node contains a value and a pointer both to the previous and next nodes.

Runtimes

Action Runtime
Insert at the start $\text{O}(1)$
Insert at the end $\text{O}(n)$
Lookup $\text{O}(n)$
Delete element lookup + $\text{O}(1)$

Abstract Data structures

Queue

A queue is an abstract data structure that preserves the elements sequence and has two operations, enqueue and dequeue. Enqueue is the insert function, which inserts elements at the back of the queue. Dequeue is the exctraction function, and retrieves the first element at the front of the queue. A queue is therefore a FIFO-data structure (First In First Out).

A simple queue

Stack

A stack is an abstract data structure which, similar to a queue, preserves the elements sequence. Unlike a queue however, a stack is LIFO (last in first out) implying that it both inserts and extracts elements from the back of the structure. These operations are called push and pop respectively.

A simple stack

Heap

A heap is a list which can be viewed as a binary tree. A heap is an implementation of the priority tree data structure. Check out heapsort

Max heap property: No node can have a higher value than its parent node.

Hash

The purpose of creating hash tables or (hash maps) is to arrange elements into a predetermined number of groups. Each element passes it's key (this can be it's value) to a hashing function, which determines which group the element is placed in. Multiple elements may be placed in the same group. These collisions aren't inherently bad as they can be dealt with, but may lead to slower run times. Common ways to resolve collisions are by either creating linked lists or using open addressing. The former creates linked lists (chains) for elements with colliding hashes, while the latter moves a colliding element to a different hash. The latter can be achieved with methods like linear probing, quadratic probing, or double hashing.

Bucket sort uses a similar concept to group unsorted elements.

Hash functions

Let $n$ be the number of elements each with key $k$, and $m$ the length of the hash array (number of groups). The load factor for hash tables is defined as $\alpha = \frac{n}{m}$. Note that for a hashing function that produces pseudo-random hashes, the average number of elements in each group will be $\alpha$. Common practice is to double the size of the hash array when $\alpha$ gets close to $1$.

Some hash functions include:

  • $h(k) = k \;\mathrm{mod}\; m$
  • $h(k) = floor(m(kA \;\mathrm{mod}\; 1))$ where $0 < A < 1$

To clarify, the first function subtracts $m$ from $k$ until $k$ is in $[0, m-1]$. The second multiplies $k$ with some float, then subtracts $1$ until it is within $0$ and $1$, then multiplies with $m$ and rounds down. Given that $k$ is random, both functions will map the a set of keys uniformly to random integers in the range $[0, m-1]$

Perfect hashing means that each element is mapped to a unique value. This can be achieved with randomized hashes where $m \gg n$, but it is not necessarily guaranteed.

Runtimes

Assuming single, uniform hashing and chaining.

Function Best Case Average Case Worst Case
Lookup $\Theta(1)$ $\Theta(1 + n/m)$ $\Theta(n)$

With a perfect hashing, the lookup only requires you to check the hash and go to it, meaning the time is constant. For a hash table with collisions, the lookup will have to account for the average number of entries in each hash. The worst possible case is that every element is placed into the same hash, requiring a linear search.

Runtime Calculations

Runtime is a measurement of how effective an algorithm is and is by far th emost important measurement in this course.

About Runtimes

In a world where computers are infinitely fast and with unlimited storage space, any true algorithm can be used to solve a problem - no matter how badly designed it was. This is not the case in reality, therefore making runtime an important aspect of programming.

A runtime describes the relationship between the input size and how many calculations (how long time) it takes to solve it. Concider a problem of a known size and with the runtime $\Theta(n^2)$. If we were to solve the same problem, but with twice the input size, the runtime would be $\Theta((2n)^2) = \Theta(4n^2)$. I.E. it would require four times the effort. Of course, this growth is desired to be as low as possible.

Common Runtimes

The following table shows the asymptotic notation for run times. It denotes how the algorithm behaves as $n \to \inf$. As an example, if an algorithm has the runtime $\Theta(n^2 + 2n)$, it will be $\text{o}(n^3)$ since $\lim_{n \to inf} \frac{n^2 + 2n}{n^3} = 0$.

Name Notation Runtime
Small-O $\text{o}(f(n))$ $< f(n)$
Big-O $\text{O}(f(n))$ $\leq f(n)$
Theta $\Theta(f(n))$ $= f(n)$
Big-Omega $\Omega(f(n))$ $\geq f(n)$
Small-Omega $\omega(f(n))$ $> f(n)$

The most general and used runtimes sorted from worst to best.

Complexity Name Type
$\Theta(n!)$ Factorial General
$\Omega(k^n)$ Exponential General
$\text{O}(n^k)$ Polynomial General
$\Theta(n^3)$ Cubic Polynomial
$\Theta(n^2)$ Quadratic Polynomial
$\Theta(n \lg n)$ Linearithmic Combination of linear and polynomial
$\Theta(n)$ Linear General
$\Theta(\lg n)$ Logarithmic General
$\Theta(1)$ Constant General

Recursion

Recursion is a problem-solving technique which is based on the fact that a solution to a problem consists of the solutions to smaller instances of the same problem. This technique will be a recurring theme in several of the algorithms in the curriculum.

One of the most common examples of recurisivity is the Fibonacci sequence, defined as:

$$F(n) = F(n-1) + F(n-2)\\ F(0) = 0, F(1) = 1$$

Fibonacci numbers $n$ are defined as the sum of the two previous Fibonacci numbers.

Examples of recursive algorithms include merge sort, quicksort and binary search. Recursive solutions can also be used in dynamic programming.

The Master Theorem

The Master Theorem is a recipe solution for finding the runtime of several reccurences.

$$ T(n) = aT(^n/_b) + f(n) \\ a \ge 1, b \gt 1 $$

This type of recurrence often occurs together with divide-and-conquer algorithms such as merge sort.

The problem is split up into $a$ subsets of size $^n/_b$, where f(n) work is done for the splitting and merging the results of recursive calls after they are complete. In the merge sort example, f(n) work is done to split the list into two sub-lists, then merging together the two sorted lists after the recursive calls are complete. This split happens in constant time $\Theta(1)$, while merging takes linear time $\Theta(n)$. We can therefore set $f(n) = n$. Since we at any time divide the list up into parts, each part $^n/_2$ is $a=2$ og $b=2$ respectively. For Merge sort we therefore have:

$$ T(n) = 2T(^n/_2) + n $$

If we didn't already know the runtime for Merge sort, we can quickly discover it by solving this recurrence. Solving the recurrence could also be achieved by using the Master Theorem as follows:

  1. Identify $a, b, f(n)$
  2. Calculate $\log_b a$
  3. Consult the following table

Table over three potential outcomes of the Master Theorem:

Case Requirement Solution
1 $f(n)\in O(n^{\log_b a-\epsilon})$ $T(n) \in \Theta(n^{\log_b a})$
2 $f(n)\in \Theta(n^{\log_b a} \log^k n)$ $T(n) \in \Theta(n^{\log_b a}\log^{k+1} n)$
3 $f(n)\in \Omega(n^{\log_b a+\epsilon})$ $T(n) \in \Theta(f(n))$

$\epsilon > 0$

Returning to the merge sort example. We have found $a=2, b=2, f(n)=n$, therefore $\log_b a = \log_2 2 = 1$. We are therefore in case 2, $f(n) \in \Theta(n^1)$, with the solution $T(n) \in \Theta(n\lg n)$.

Example

Another example retrieved from the 2009 Continuation Exam:

Solve the following recurrence. Give the answer in asymptotic notation. Give a short explanation.

$$ T(n) = 2T(^n/_3) + ^1/_n $$

We have $a = 2, b = 3, f(n) = n^{-1}$. This gives us $\log_b a = \log_3 2 \approx 0.63$. Since $f(n) = n^{-1}$ is $O(n^{.63})$, leaving us in case 1 and the solution $T(n) \in \Theta(n^{.63})$.

Sorting and Search

Sorting and search are two problems that either occur as stand alone problems or as a subproblem. Sorting implies organizing a set of elements into a specific order. Searching implies that we are looking to find a certain element in a set of elements.

Search

Finding a specific element in a data structure is a common problem. In a list of numbers we may want to search for the median, or a certain number. In a graph we may want to search for a path from one element to another. Searching algorithms for graphs (DFS and BFS) are covered in Graph algorithms.

The two most common ways to search lists are Brute force and Binary search.

Brute force

This method is pretty self explanatory. Iterate through the list (sorted or unsorted) from beginning to end and compare each element with the element you are trying to find. With $n$ elements in the list the runtime will be $O(n)$

Binary search

If we know that the list we are searching is sorted, we can use a better strategy than brute force. Let $a$ be the target value we are searching for in a sorted list $L$. We start by finding the middle element and check if it is equal to the element we are searching for. If so, we return the index and the search is complete. If it is not the element we are searching for, we compare it to our target value. If the middle value is larger than the search term, the search continues in the subset of smaller values (left). If it is less than the search term, the search continues in the subset of higher values (right). Regardless of direction, the search process will keep comparing the middle term and halving the possible answers until the target value is found.

In Python:

def binary_search(li, value, start=0):
    # Dersom lista er tom, kan vi ikke finne elementet
    if not li:
        return -1
    middle = len(li) // 2
    middle_value = li[middle]
    if value == middle_value:
        return start + middle
    elif value < middle_value:
        # Første halvdel
        return binary_search(li[:middle], value, start)
    elif value > middle_value:
        # Siste halvdel
        return binary_search(li[middle + 1:], value, start + middle + 1)

Each iteration will halve the searchable elements. The search can at most take $O(\log n)$ time, since there is only $\frac{n}{2^{log(n)}} = 1$ element left after $\log n$ halvings.

Best Case Average Case Worst Case
$O(1)$ $O(\log n)$ $O(\log n)$

Sorting

Sorting algorithms can be categorized into two groups: Comparison based and distributed/non-comparison-based.

Stability

A sorting algorithm can be considered stable if the order of identical elements in the list that is to be sorted is preserved throughout the algorithm. Given the following list:

$$[B_1, C_1, C_2, A_1]$$

Sorting it with a stable algorithm the identical elements will stay in the same order before and after the sort.

$$[A_1, B_1, C_1, C_2]$$

Comparison Based Sorting Algorithms

All sorting algorithms that are based on comparing two elements to determine which of the two are to come first in a sequence, are considered comparison based. It is proven that $\Omega(n\lg n)$ is the lower threshold for number of comparisons a comparison based algorithm has to perform. In other words, it is impossible to produce a comparison based sorting algorithm which has a better worst-case runtime than $O(n\lg n)$.

Merge sort

Merge sort is a comparison based sorting algorithm. It uses a divide-and-conquer strategy to sort the input list.

Best Case Average Case Worst Case
$O(n\log n)$ $O(n\log n)$ $O(n\log n)$

The core aspects of the algorithm is as follows:

  1. Divide the unsorted list into $n$ sublists which each contain one element (a list with only one element is sorted)
  2. Pairwise, merge together each sorted sublist.

For all practical purposes this is implemented recursively. The function takes one parameter, the list that is to be sorted. If the list contains only one element, the list since it is already sorted. If it is longer, we divide it up into two equal sublists and a recursively call merge sort on each sublist. The returned list from each recursive call is then merged together. In Python:

def merge_sort(li):
    if len(li) < 2:    # Dersom vi har en liste med ett element, returner listen, da den er sortert
        return li
    sorted_l = merge_sort(li[:len(li)//2])
    sorted_r = merge_sort(li[len(li)//2:])
    return merge(sorted_l, sorted_r)

def merge(left, right):
    res = []
    while len(left) > 0 or len(right) > 0:
        if len(left) > 0 and len(right) > 0:
            if left[0] <= right[0]:
                res.append(left.pop(0))
            else:
                res.append(right.pop(0))
        elif len(left) > 0:
            res.append(left.pop(0))
        elif len(right) > 0:
            res.append(right.pop(0))
    return res

The merge function runs in linear time. To analyze the runtime to Merge sort we can set up the following recurrence:

$T(n) = 2T(n/2) + n$

In other words it will take twice as long to sort than for a list of length $^n/_2$, plus the time it takes to merge the two lists. The recursion can easily be solved using the the master theorem, which gives a runtime of $O(n\log n)$.

Most implementations of Merge Sort are stable. As a result of the algorithm continuously producing new sublists, it requires $O(n)$ extra memory space.

Quicksort

Quicksort is a comparison based sorting algorithm that also uses divide-and-conquer tactics.

Best Case Average Case Worst Case
$O(n\log n)$ $O(n\log n)$ $O(n^2)$

Similar to merge sort, quicksort divides the problem into smaller parts and then recursively solves. The quicksort function takes only one parameter: the list that is to be sorted. The concept is as follows:

  1. If there is only one element in the list, the list is concidered sorted and is to be returned.
  2. Select a pivot element, the easiest being the first element in the list.
  3. Create two lists, one (lo) which contains the elements from the original list that are smaller than the pivot element, and one (hi) that contains all elements larger than the pivot.
  4. Recursively sort lo and hi with quicksort, then return $lo + pivot + hi$.

In Python:

def quicksort(li):
    if len(li) < 2:
        return li
    pivot = li[0]
    lo = [x for x in li if x < pivot]
    hi = [x for x in li if x > pivot]
    return quicksort(lo) + pivot + quicksort(hi)

Calculating the runtime of quicksort will be somewhat more difficult than for merge sort, as the size of the lists that are returned recursively depend on the selected pivot. Selecting a good pivot is an art in itself. The naïve method of consistently selecting the first element can easily be exploited by, say, a list that is in reversed sorted list. In this case the runtime will be $n^2$. This can be countered by selecting an arbitrary (random) pivot each time instead.

Bubblesort

Traverses the list and compares two elements and a time, swapping them if necessary. Note the algorithm has to run through the list several times, at worst $n$ traversals.

Best Case Average Case Worst Case
$O(n)$ $O(n^2)$ $O(n^2)$

Insertion Sort

Most common way for humans to sort cards. Take the first element, and place in an empty "sorted" list. Select the next unsorted element and place it accordingly to the first element based on value. Iteratively insert element $i$ from the unsorted list into the sorted list in the already sorted list of $i-1$ elements.

Best Case Average Case Worst Case
$O(n)$ $O(n^2)$ $O(n^2)$

Selection sort

A slow sorting algorithm that searches through the entire list and selects the smallest element each time. ${\tiny At}$ ${\tiny least}$ ${\tiny it}$ ${\tiny works..}$

Best Case Average Case Worst Case
$O(n^2)$ $O(n^2)$ $O(n^2)$

Other Sorting Algorithms

Sorting algorithms that are not comparison based are not limited to having $O(n\lg n)$ as lower runtime threshhold. The prerequisite for these algorithms is that certain attributes are already known about the lists before sorting.

Heapsort

There are several methods of orgainzing a heap. A max-heap is a binary tree where each node value is never higher than its parents value. A min-heap is the opposite, where no node is allowed a lower value than its parent. A heap can be built in $\text{O}(n)$ time.

Methods for heap include:

Method Runtime
Build-max-heap $O(n)$
Extract-max $O(\log n)$
Max-heapify $O(\log n)$
Max-heap-insert $O(\log n)$
Heap-increase-key $O(\log n)$
Heap-maximum $\Theta(1)$

Heapsort builds a heap og places the highest element at the end of the list and maintains the heap with max-heapify.

Best Case Average Case Worst Case
$O(n \log n)$ $O(n \log n)$ $O(n \log n)$

Counting sort

Counting sort assumes that the input is a list $L$ consisting of only integers elements less than or equal to some integer $k$. The algorithm first creates a list, call it $C$, of length $k+1$, which counts the amount of all unique entries in $L$. $C$ is then made cumulative. Then this information is used to place the values from $L$ into their correct position. Note that after creating the cumulative $C$ list, counting sort will iterate over $L$ backwards. This is so to make the output stable.

Best Case Average Case Worst Case
$O(n + k)$ $O(n + k)$ $O(n + k)$

If $k = O(n)$ , the runtime will be $ \Theta(n)$

Radix sort

Radix sort assumes that the input is $n$ elements with $d$ digits, where each digit can have up to $k$ different values. The algorithm starts usually with the least significant digit and sorts the list according to this value before moving on to the next least significant digit. If the sorting is based on a stable algorithm that sorts in $\Theta(n+k)$ (e.g. counting sort), then the entire algorithm will use $\Theta(d(n+k))$ time.

Best Case Average Case Worst Case
$\Theta(d(n+k))$

Bucket sort

Bucket sort assumes that the input is generated from an arbitrary process that distributes the elements uniformly and independently across an interval. Bucket sort divides the interval into $n$ equally large buckets and distributes the $n$ input values into the buckets. Each bucket is then individually sorted by using a sorting algorithm such as another bucket sort, insertion sort etc. The buckets will often be small, so using an algorithm with a low amount of constant calculations could prove useful.

Best Case Average Case Worst Case
$\Theta(n)$ $\Theta(n^2)$

Topological sort

Topological sort is a sorting algorithm that is used to organize the nodes into a Directed Acyclic Graph (DAG). If there exists an edge $(u, v)$, then the node $u$ must come before $v$ in the sequence. DFS is used to find this sequence.

Graphs and Graph Algorithms

A graph is a mathematical structure that is used to model pairwise relations between objects. In other words: a graph is an overview over several small relations. In this course graphs are some of the most important data structures and have several accompanying algorithms.

Representation

A graph can look like this:

An undirected graf with six nodes

There are multiple ways to implement graphs. One could take an object oriented approach, where node objects have pointers to their children. Another more common method is to use neighbor-lists or matrices, where each entry represents a connection in the graph.

Neighbor Lists

Given the graph G with nodes $V = \{A,B,C,D\}$ we can represent each node's neighbor in the following manner:

$$A: [B, C] \\ B:[A, C] \\ C:[D]$$

To clarify, $A$ has edges to $B$ and $C$, $B$ has edges to $A$ and $C$, and $C$ has an edge to $D$. This method of representing graphs is useful when a graph is sparse, meaning it has relatively few edges.

Neighbor Matrices

For dense graphs, graphs with relatively many edges, neighbor-matrices may prove more useful for representing edges. For a graph with $|V|$ nodes, this would require a $|V| \times |V|$ sized matrix. Example:

    0 1 2 3 4 5
  -------------
0 | 0 1 0 0 0 1
1 | 0 0 1 1 0 1
2 | 1 1 0 1 0 0
3 | 0 0 1 0 1 0
4 | 0 0 0 0 0 1
5 | 1 0 1 0 1 0

This matrix represents a graph $G$ with nodes $V = \{0,1,2,3,4,5\}$. The matrix entry $a_{u,v}$ is $1$ if there is an edge from node $u$ to node $v$, and $0$ otherwise. Neighbor matrices can also be modified to represent weighted graphs, such that the entry $a_{u,v}$ now represents the weight on the edge $(u,v)$. For example, if the weight on the edge $(u,v)$ is $w(u,v) = 3$, the neighbor matrix entry $a_{u,v}$ is equal to $3$.

Traversal

Now that we have a representation for graphs it is possible to traverse the nodes through the nodes. The two most common methods of traversal are Breadth First Search (BFS) and Depth First Search (DFS).

Breadth First Search (BFS)

Breadth first search is a graph traversal algorithm that as it examines a node, it adds all of that node's children to the queue. More precisely:

  1. The input is a graph and a starting node. The algorithm uses a queue to establish in which order it visits each node. The first node added to the queue is the starting node.
  2. As long as there exists elements in the queue, the algorithm removes (dequeues) the first element from the queue, mark the node as visited, then enqueue it's children.

BFS can mark each node with a distance value, which indicates how many edges away it is from the starting node. Also, if you are searching for an element using BFS, the search can be terminated when the node you are searching for is found and dequeued.

Runtime: $O(|V| + |E|)$

Depth First Search (DFS)

DFS is similar to BFS, but uses a stack instead of a queue. Since a stack is LIFO, each child node is pushed onto the stack until a leaf node is reached. The node on the top of the stack is then popped, handled and discarded before moving to the underlying node. More precisely:

  1. The input is a graph and a starting node. The algorithm uses a stack to establish in which order it visits each node. The first node added to the stack is the start node.
  2. The algorithm pushes child nodes to the stack until a leaf node is reached. The topmost node on the stack is then handled, and this repeats until a node has children.

DFS cannot find the distances between nodes in a graph, but can record how many nodes it visited before adding the node in question onto the stack, as well as when it removed said node from the stack. This is called start and stop times, which can prove useful for topological sorting.

Runtime: $O(|V| + |E|)$

Traversing order

In-order traversal, pre-order traversal, and post-order traversal all traverse nodes in the same order, the difference is when each node is handled (sometimes called visited or colored). When traversing a tree, these traversal methods work in the following way:

  • In a pre-order traversal, nodes are handled before visiting any subtrees.
  • In an in-order traversal, nodes are handled after visiting the left subtree.
  • In a post-order traversal, nodes are handled after visiting the both subtree.
Traversal method Visualization Result
Pre-order Alt text F, B, A, D, C, E, G, I, H
In-order Alt text A, B, C, D, E, F, G, H, I
Post-order Alt text A, C, E, D, B, H, I, G, F

Minimal Spanning Trees

A minimal spanning tree is a tree that is connects to all nodes exactly once, and has the lowest possible cumulative edge weight.

Kruskal

Kruskal's algorithm creates a tree by finding the smallest edges in the graph one by one, and creating a forest of trees. These trees are gradually merged together into one tree which becomes the minimal spanning tree. First the edge with the smallest weight is found and this edge is made into a tree. Then the algorithm looks for the second smallest edge weight and connects these two nodes. If these were two free nodes they are connected as a new tree; if one of them are connected to an existing tree, the second node is attached to that tree; and if both are in separate trees these two trees are merged; if both are in the same tree then the edge is ignored. This method continues until we have one complete tree with all nodes.

Best Case Average Case Worst Case
$O(E \log V)$

Prim

Prim's algorithm creates a tree by starting in an arbitrary node and then adding the edge (and connecting node) with the smallest weight to tree. With now two nodes connected, the edge with the lowest weight that connects to either of the two nodes is added. This continues until all of the nodes have become a part of the tree. Runtime depends on the underlying data structure, however the curriculum uses a binary heap.

Best Case Average Case Worst Case
$O(E \log V)$

Shortest Path

Finding the shortest path between two nodes is a common problem. It can be applied to for instance finding the the best GPS route.

In general, shortest path problems are divided into subproblems where we look at the shortest stretches between nodes.

Typical issues of concern when selecting an algorithm is:

  • Is the graph directed or undirected (ordinary)?

  • Are there negative edges?

  • Do cycles exist in the directed graph? If the cycle is positive, there will always be a shorter path that does not include the cycle.

  • Are negative cycles created? In this case, finding a shortest non-simple path is impossible.

One-to-All

One-to-all means finding the distance to all notes from a single starting node.

Relax

Relax is a function which shortens the estimated distance to a node $v$. Assuming we know the distances to all parent nodes of $v$ from some starting node $s$, we can relax $v$ such that it's distance $v.d$ is updated to be the distance $v.d = u.d + w(u,v)$, where $u$ is the parent that minimizes this distance. In the case that this gives an improved distance, it will set $v.\pi = u$.

An important property of relax is that if $p$ is the shortest path from $s$ to $v$, and we relax each node in order along the path, then $v.d$ will be set to the lowest distance possible.

Dijkstra's algorithm

Dijkstra's algorithm only works if all edges are non-negative.

Best Case Average Case Worst Case
$O(|E|\log|V| + |V|\log |V|)$

If one or more negative cycles exist, this algorithm will not terminate as normal. It is possible to add an alternative stopping method, however the result may be wrong.

Dijkstra is most effective when used in a heap. Implementation in other structures can cost higher runtimes and are not in the curriculum.

How it works:

  1. Initialize distances; Set the starting node to $0$ and all others to $\inf$.
  2. Initialize visited; Set the startnode to visited, and all other to not visited.
  3. Look at all neighbors; Calculate and update distances.
  4. Go to a node which hasn't been visited, and with the lowest distance value from start. Mark it as visited.
  5. Repeat steps $3$-$5$ until all nodes have been visited or an specific end node is visited.

In python:

def dijkstra(G,start): #G er grafen, start er startnoden
    pq = PriorityQueue() # initialiserer en heap
    start.setDistance(0) # Initialiserer startnodens avstand
    pq.buildHeap([(node.getDistance(),node) for node in G]) # Lager en heap med alle nodene
    while not pq.isEmpty():
        currentVert = pq.delMin()    # velger greedy den med kortest avstand
        for nextVert in currentVert.getConnections():
            newDist = currentVert.getDistance() + currentVert.getWeight(nextVert) #kalkulerer nye avstander
            if newDist < nextVert.getDistance():
                nextVert.setDistance( newDist ) # oppdaterer avstandene
                nextVert.setPred(currentVert) # Oppdaterer foreldrene til den med kortest avstand
                pq.decreaseKey(nextVert,newDist) # lager heapen på nytt så de laveste kommer først.

Bellman-Ford

Unlike Dijkstra, Bellman-Ford works also with negative edges.

Best Case Average Case Worst Case
$O(|V| |E|)$

Bellman-Ford is defined recursively. In contrast to Dijkstra, Bellman-Ford relaxes all edges multiple times. This helps discover any potential negative cycles. All edges are relaxed $V - 1$ times. Note that the order in which the edges are relaxed does not matter in the Bellman-Ford algorithm. Since all edges are relaxed $V - 1$ times, it is guaranteed to find the shortest one-to-all path if there are no negative cycles. If one more iteration is ran and the result is compared to the previous iteration, one can deduce whether or not there are negative cycles in the graph. This is because a negative cycle can be relaxed indefinitely.

How it works:

  1. Initialize the algorithm by setting the distance from the startnode to 0, and all others to +inf.
  2. Compare all distances, and update each node with the new distance.
  3. Run one more iteration to check for negative cycles.

In python:

def bellman_ford(G, s) # G er grafen, med noder og kantvekter. s er startnoden.
    distance = {}
    parent = {}
    for node in G: # Initialiserer startverdier
        distance[node] = float('Inf')
        parent[node] = None
    distance[s]=0 # initialiserer startnoden

    # relax
    for (len(G)-1):
        for u in G:    # fra node u
            for v in G[u]:    # til naboene
                if distance[v] > distance[u] + G[u][v]: # Sjekker om avstanden er mindre
                    distance[v] = distance[u] + G[u][v]    # hvis ja, oppdaterer avstanden
                    parent[v]=u    # og oppdaterer parent

    # sjekk negative sykler
    for u in G:
        for v in G[u]:
            if distance[v] > distance[u] + G[u][v]:
                return False

DAG shortest path

It is possible to topologically sort a DAG (Directed Acyclic Graph) to find the correct order in which to relax each edge to find the shortest one-to-all path.

Psuedocode from Cormen:

1 DAG-SHORTEST-PATHS(G, w, s)
2    for each vertex v in G.V
3        v.d = inf  # Distance
4        v.p = null # Parent
5    s.d = 0
6    for each vertex u in G.v, where G.v is topologically sorted
7        for each adjacent vertex v to u
8            if v.d > u.d + w(u, v) # Relax
9                v.d = u.d + w(u, v)
10               v.p = u
Best Case Average Case Worst Case
$\Theta(|V| + |E|)$ $\Theta(|V| + |E|)$ $\Theta(|V| + |E|)$

All to All

Johnson's algorithm

Johnson's algorithm combines Dijkstra and Bellman-Ford such that it finds an all-to-all distance matrix for graphs. It allows graphs with negative-weight edges, but not negative-weight cycles. It works best with graphs with few edges.

How it works:

Let $G$ be the input graph with nodes $G.nodes$ and edges $G.edges$, where each edge has a weight $G.edge.weight$. Firstly, the algorithm constructs a new graph $G_s$ containing all nodes and edges from $G$ as well as a new node $s$ with edges connecting it to each other node. Then Bellman-Ford is ran on $G_s$ with $s$ as the starting node to relax each edge, as well as to ensure no negative-weight cycles. Let $h(n)$ be a function that for each node outputs the distance from $s$ to $n$. This node distance is found by running the Bellman-Ford algorithm on $G_s$ from earlier. Now each edge $(u,v)$ in $G.edges$ (the original input graph, since $s$ is redundant) is given a new weight defined by $G.edge.weight = G.edge.weight + h(u) - h(v)$. After these initial steps, every edge weight is now non-negative while maintaining the same distance for two arbitrary nodes in the graph $G$. These steps (corresponding to lines $1$ trough $6$) have successfully removed negative edges in the graph using telescoping sums. Each node distance is both added and subtracted, meaning they will cancel out everywhere but the ends of each path. The algorithm now runs Dijkstra from every node, and reverts the changes made to the weights before each distance is added to a matrix $D$.

The following is pseudocode of Johnson's algorithm.

1  G_s = construct_G_with_start_node_s()
2  Bellman_Ford(G_s, s)
3  for node in G.nodes
4      h(node) = G_s.node.distance #distance s to node
5  for edge in G.edges
6      edge.weight = edge.weight + h(edge.parent) - h(edge.child)
7  D = matrix(n, n) #some n*n matrix
8  for node_I in G.nodes
9      Dijkstra(G, node_I) #uses dijkstra to find distances from node "node_I" with updated edge.weights in G
10     for node_J in G.nodes 
11         D[I, J] = node_J.distance + h(edge.child) - h(edge.parent) #add all distances starting in node I to the matrix
Best Case Average Case Worst Case
$O(|V|^2lg|V| + |V| |E|)\dagger$

$\dagger$ Implemented with Fibonacci heap

Floyd-Warshall

FW works even if there are negative edges, but no negative cycles. The nodes must be stored as a neigbor matrix not a list.

Psuedocode from wikipedia:

1 let dist be a |V| × |V| array of minimum distances initialized to ∞ (infinity)
2 for each vertex v
3    dist[v][v] ← 0
4 for each edge (u,v)
5    dist[u][v] ← w(u,v)  // the weight of the edge (u,v)
6 for k from 1 to |V|
7    for i from 1 to |V|
8       for j from 1 to |V|
9          if dist[i][j] > dist[i][k] + dist[k][j] 
10             dist[i][j] ← dist[i][k] + dist[k][j]
11         end if

In Python:

def FloydWarshall(W):
    n = len(W)
    D = W
    PI = [[0 for i in range(n)] for j in range(n)]
    for i in range(n):
        for j in range(n):
            if(i==j or W[i][j] == float('inf')):
                PI[i][j] = None
            else:
                PI[i][j] = i

    for k in range(n):
        for i in range(n):
            for j in range(n):
                if D[i][j] > D[i][k] + D[k][j]:
                    D[i][j] = D[i][k] + D[k][j]
                    PI[i][j] = PI[k][j]
    return (D,PI)
Best Case Average Case Worst Case
$\Theta(|V|^3)$ $\Theta(|V|^3)$ $\Theta(|V|^3)$ linear arr
$O(|VElgV|)$ binary he
$O(|V^2lgV+VE|)$ binary he

The Algorithm creates matrix $D^{(k)}$ where $D^{(k)}[i][j]$ gives the shortest path from i to j which just passes nodes numbered k or less.

Max Flow

Flow can be visualized by forexample a pipe system to deliver water to a city, or as a network with differing capacity on the various cables. Max fow is the maxium net capacity that actually flows through the network. There may exist some critical pipes (edges) with very low capacity, bottlenecking the entire network no matter the size of the other pipes. Max flow is achieved when there exists no more augumenting paths.

Flow Network

A flow network is a directed graf where all the edges have a non-negative capacity. Additionally, it is required that if there exists an edge between u and v, there exists no edge in the opposite direction from v to u. A flow network has a source, s, and a sink, t. The source can be seen as the starting node and the sink as the end node. The graph is not divided so for all v there exists a route s ~v~t. All nodes except from s have atleast one inbound edge. All nodes except the source and the sink have a net flow of 0 (all inbound flow = all outbound flow).

A flow network can have several sources and sinks. To eliminate this problem, a super source and super sink are created and linked up to all respective sources and sinks with edges that have infinite capactiy. This new network with only one source and sink is much easier to solve.

Residual Network

The residual network is the capacity left over: $$ c_f(u,v) = c(u,v) - f(u,v)$$

To keep an eye on the residual network is useful. If we send 1000 liters of water from u to v, and 300 liters from v to u, it is enough as sending 700 from u to v to achieve the same result.

Augumenting Path

An augumenting path is a path from the source to the sink that increases the total flow in the network. It can be found by looking at the residual netowrk.

Minimal cut

A cut in a flow network divides the graph in two, S and T. It is useful to look at the flow through this cut:

$$ f(S,T) = \sum_{u \in S}\sum_{v \in T} f(u,v) - \sum_{u \in S}\sum_{v \in T} f(v,u) $$ $$ c(S,T) = \sum_{u \in S}\sum_{v \in T} c(u,v) $$ The number of potential cuts in a network with n nodes is:

$$|C| = 2^{n-2}$$

Of all possible cuts, we want to see the cut with the smallest flow, as this is the bottleneck of the network. Proof exists that show that finding the minimal cut simultaneously gives us the maximum flow through the network.

$$f(S,T)\ =\ c(S,T)$$

Ford-Fulkersons method

Given flow f, in each iteration of Ford-Fulkerson we find an augumenting path p, and use p to modify f.

Best Case Average Case Worst Case
$O(V*E^2)$ $O(E_f)$

Edmonds-Karp

Edmonds-Karp is Ford-Fulkerson's method where BFS is the traversal method. This ensures a runtime of $O(V*E^2)$.

Whole Number Theorem

If all capacities in the flow network are whole numbers, then Ford-Fulkerson's method will find a max flow with a whole number value, and the flow between all neighbor nodes will have whole number values.

Dynamic Programming

Dynamic programming divides up problems into smaller subproblems, just like divide-and-conquer. DP can be used when there is both optimal substructure and where subproblems overlap.

It is often possible to use divide-and-conquer on these types of problems, but often a lot of redundant work is done, I.E. the same subproblem is solved several times. DP only needs to solve a subproblem once, and then uses the result each time it is needed.

DP is usually used for optimization problems, but as there can be several optimal solutions it is one of many optimal solution.

  1. Characterize the structure of the optimal solution.
  2. Recursively define the value of an optimal solution.
  3. Compute the value of an optimal solution, typically in a bottom-up fashion.
  4. Construct an optimal solution from computed information.

Longest common substring

Is solved by DP from the bottom up, by looking at the last element $i$ each list.

Rod-cutting

Given a rod with length $n$, and a list of prices for all lengths shorter than $n$. Decide the max profit you can achieve by cutting the rod up and selling it.

We can cut up a rod on length $n$ in $2^{n-1}$ different ways. We have a binary choice of cutting or not cutting at distance $i$ inches from the left end for $i = 1,2,...,n-1$, meaning we can structure the choices with a binary tree.

Greedy Algorithms

Some times a DP is a bit overkill. Greedy algorithms choose the solution that seems the best righ tthen and there, and moves on. This does not always leed to the best solution, but for many problems this works fine. Minimal spanning trees is an example of a greedy method. Greedy algorithms solve the problem if it has the following attributes:

  • $Greedy-choice\ property$ We can make a choice that seems optimal at the time and solve the subproblems that occur later. The greedy choices can not be dependent on future choices nor all existing solutions. It simply iteratively takes greedy choices and reduces the given problem to a smaller one.

  • $Optimal\ substructure$ Optimal solitions to a problem incorporate optimal solutions to related subproblems, which we may solve independently.

Planning activities

Choose as many non-overlapping activities as possible. Assume that they are sorted by ending time and use a greedy algorithm to select them.

Huffman's algorithm

Huffman's algorithm is a greedy algorithm with the intent to minimize saving space to a known sequence of symbols. Each symbol can be represented by a binary code. The given code is dependent on the frequency the symbol, where the more frequent terms get the shorter bits. Depending on term distribution, this reduces the space required by 20-90%.

Huffman is performed by always summing the two nodes with the smallest frequency values, and then using the num of these as the value of a new node. By branching left with $0$s and branching right with $1$ all symbols will be able to be represented with unique binary codes. The symbols that are most frequent will recieve shorter codes than the sparse ones.

Runtime:

Best case Average case Worst case
$O(n \lg n)$ $O(n \lg n)$ $O(n \lg n)$

One version of Huffmann coding can look like this:

Huffman coding

Multithreading

The basics of Multithreading

Concurrency keywords

  • Spawn - used to start a parallel thread, allowing tha parent thread to continue with the next line while the child thread runs its subproblem
  • Sync - A procedure cannot safely use the values returned by its spawned children until after it executes a sync statement. This keyword indicates that the program must wait until both threads are finished allowing for the correct variables to syncronize
  • Serializaion - the serial algorithm that results from deleting the multithreaded keywords: spawn, sync, and parallel.
  • Nested parallelism - occurs when the keyword spawn precedes a procedure call. A child thread is spawned to solve the subproblem which in turn runs their own spawn subproblems, potentially creating a vast tree of subcomputations, all excecuting in parallel.
  • Logical parallelism - concurrency keywords express logical parallelsism, indicating which parts of the computation may procedd in parallel. At runtime it is upt o a scheduler to determine which subcomputations actually run concurrently.

Performance Measures

Two metrics, work and span.

Work - The work of a multithreaded computation is the total time to execute the entire computatin on one processor. It is the sum of the times taken by each strand. Span - The span is the longest time to execute the strands along any path of the DAG. Again for a DAG in which each strand takes unit time, the span equals the number of vertices on a longest or critical path in the DAG. Given that time to calculate is denoted as $T_p$ where p stands for the number of processors used, span is the running time if we could run each srand on its own processor (or had infinite processors), $T_\infty$. Work Law - $T_P \geq T_1 / P$ Span Law - $T_P \geq T_\infty$ Speedup - $T_1/T_\infty$, says how many times faster the computation is on P processors than on 1 processor.

  • Rewriting the work law we get $T_1/T_P \leq P$, stating that the speedup on P processors can at most be P.
  • If $T_1/T_P = \theta(P)$ implies linear speed up
  • If $T_1/T_P = P$ implies perfect linear speed up

  • The ratio $T_1/T_\infty$ of the work to the span gives the parallelism of the multithreaded computation.

Problem Complexity

P, NP, NPC

The question connected to whether $P$ = $NP$ is not one of the seven remaining Millennium-problems.

Relationships between P, NP, NPH og NPC We know that $P$ is a subset of $NP$, but we do not know if $P=NP$.

P Polynomial time
NP Nondeterministic polynomial time
NPC Nondeterministic polynomial time complete
NPH Nondeterministic polynomial time hard

That a problem is P, implies it is solvable in polynomial time. An NP-problem is a problem that we can prove is true (verifiable), at polynomial time. If it is possible to falsify the solution in polynomial time, the problem is a part of the co-NP-class. NP-Hard (NPH) problems cannot be solved in polynomial time. It is said that this class of problems are atleast as difficult as the most difficult problem in the NP-class. In other terms, they are problems which can be reduced from NP-problems in polynomial time, but not necessarily allows themselves to be verified in polynomial time with a given solution. NP-hard problems that allow themselves to be verified in polynomial time are called NP-complete.

Reducibility-relation

To understand the proof technique used to prove a problem is NPC, a few definitions need to be in place. One of these is the reducibility-relation $\leq_P$. In this context it is used to say that a language ($L_1$) is polynomially reducible to another language ($L_2$): $L_1 \leq_p L_2$.

Cormen et al., exemplifies that the linear equation $ax + b = 0$ can be transformed to $0x^2 + ax + b = 0$. All viable solutions for the quadratic equation are also viable solutions for the linear equation. The idea behind the example is to show that a problem, X, can be reduced to problem, Y, so that the input values for X can be solved with Y. If you can reduce X to Y, it implies that to solve Y requires atleast as much as solving X, therefore must Y be as difficult to solve as X. It is worth mentioning that this relation is not mutual, and you therefore cannot use X to solve Y.

Some known NPC problems

Some common examples of NPC problems include:

  • Circuit-SAT
  • SAT
  • 3-CNF-SAT
  • Clique-problemet
  • Vertex-Cover
  • Hamilton-Cycle
  • Travelling Salesman (TSP)
  • Subset-Sum

But how can we prove a poblem is NPC?

Cormen et al. divides this into two parts:

  1. Prove that the problem belongs to the NP-class. Use a certification to prove that the solution is verified in polynomial time.
  2. Prove the problem is NP-hard. This is done through polynimial time reduction.

Given $P$ != $NP$, then we can say that a problem is NPC if it is both an NP-problem and an NP-hard-problem.

This connection is illustrated in the figure below: Connecton between some NPC problems

Given that you are trying to prove that TSP is an NPC-problem:

  1. Prove that TSP$\in$ NP. The certification you use can be a sequence of n nodes that you shall visit on your trip. If the sequence just consists of unique nodes (we do not visit the same city more than oce) then we can sum up the costs and verify the total cost is less than a given number k.
  2. Prove that TSP is NP-hard. If you have understood what the different problems described above, then you will recognize that TSP is a Hamilton Cycle-problem (given an undireted graph G, there exists a cucle that contains all the nodes only once, and where the starting node = the end node). We know that TSP is atleast as difficult to solve as a the Ham-cycle problem (as Ham-cycle is a subproblem of TSP). Given that we already have proved that the Ham-cycle problem is NP-hard, we can show with polynomial time reduction that TSP is also NP-hard (since we reduce Ham-cycle to TSP in polynimial time, then TSP is NP-H)

$HAM-CYCLE \leq_P TSP$.

As most people now believe that $P$ != $NP$, then it follows that NP-H problems cannot be solved in polynomial time, and neither NP-problems. Disproving this, implying that $P$ != $NP$ or that $P$ = $NP$, then you have claim to a rather large sum of money.

Runtimes for curriculum algorithms

Sorting and selection

  • $n$ is the number of elements that are being handled.
  • $k$ is the highest value possible.
  • $d$ is the max munber of digits an element can have.
Algorithm Best case Average case Worst case Space Complexity Worst Case Stable
Merge sort $\Omega(n \lg n)$ $\Theta(n \lg n)$ $O(n \lg n)$ $O(\lg n)$ Yes
Quick sort $\Omega(n \lg n)$ $\Theta(n \lg n)$ $O(n^2)$ $O(n)$ No
Heap sort $\Omega(n \lg n)$ $\Theta(n \lg n)$ $O(n \lg n)$ $O(1)$ No
Bubble sort $\Omega(n)$ $\Theta(n^2)$ $O(n^2)$ $O(1)$ Yes
Insertion sort $\Omega(n)$ $\Theta(n^2)$ $O(n^2)$ $O(1)$ Yes
Selection sort $\Omega(n^2)$ $\Theta(n^2)$ $O(n^2)$ $O(1)$ No
Bucket sort $\Omega(n + k)$ $\Theta(n + k)$ $O(n^2)$ $O(n)$ Yes
Counting sort $\Omega(n + k)$ $\Theta(n + k)$ $O(n + k)$ $O(k)$ Yes
Radix sort $\Omega(d(n + k))$ $\Theta(d(n + k))$ $O(d(n + k))$ $O(n+k)$ Yes*
Select $O(n)$ $O(n)$ $O(n)$ NA NA
Randomized select $O(n)$ $O(n)$ $O(n^2)$ NA NA
  • Radix sort need require a stable sub-sort for itself be stable

Heap-operations

Operations Runtime
Insert $O(\lg n)$
Delete $O(\lg n)$
Build $\Theta(n)$
Max-heaplify $O(\lg n)$
Increase-key $O(\lg n)$
Maximum $\Theta(1)$

Graph-operations

Algorithm Best case Average case Worst case
Topological sort $\Theta(V + E)$ $\Theta(V + E)$ $\Theta(V + E)$
Depth-first-search $O(E+V)$ $O(E+V)$ $O(E+V)$
Breadth-first-search $O(E+V)$ $O(E+V)$ $O(E+V)$
Prim's $O(E \lg V)$ $O(E \lg V)$ $O(E \lg V)$
Kruskal's $O(E \lg V)$ $O(E \lg V)$ $O(E \lg V)$
Bellmann-Ford's $O(VE)$ $O(VE)$ $O(VE)$
Dijkstra's $O(V \lg V + E) $ $O(V^2)$ $O(V^2)$
Floyd-Warshall's $O(V^3)$ $O(V^3)$ $O(V^3)$
DAG-shortest-path $O(V+E)$ $O(V+E)$ $O(V+E)$

Other algorithms

Written by

hackerman7331
Last updated: Thu, 29 Oct 2020 14:36:19 +0100 .
  • Contact
  • Twitter
  • Statistics
  • Report a bug
  • Wikipendium cc-by-sa
Wikipendium is ad-free and costs nothing to use. Please help keep Wikipendium alive by donating today!