# TDT4287: Algorithms for Bioinformatics

Disclaimer: I haven't really attended any lectures, and I'm writing this as I go. Also, all code is pseudocode.

# Curriculum (2016)

The following chapters in the book (Jones & Pevzner) are in the curriculum:

- 4.4-4.9 (Exhaustive Search)
- 5.1-5.5 (Greedy Algorithms)
- 6.4-6.14 (Dynamic Programming Algorithms)
- 7.2-7.4 (Divide-and-Conquer Algorithms)
- 8.1-8.9 (Graph Algorithms)
- 9.3-9.8 (Combinatorial Pattern Matching)
- 12.2-12.3 (Randomized Algorithms)

(or, at least these are the chapters listed under lecture plan on It's Learning)

# Exhausitve Search

We would like to infer a common sequence from a set `window`

iterate over sublist `0,1,..,k`

, `1,2,..,k+1`

, ...):

```
seqs = set()
for s in S:
for seq in s.window(k):
seqs[seq] += 1
return max(seqs)
```

However, sometimes the common sequence may have undergone simple mutations, which renders our algorithm useless. We could somehow align the sequences in

## Motif Finding

The motif finding problem is exactly this.
The main problem is to find the alignment of the sequences.
We would want to define a scoring function, which ranks the alignment;
for instance we could sum the frequencies of the most popular elements (this is what the book does), making the best possible score

## Median String Problem

A related problem is the Median String Problem.
We would like to find a sequence that minimizes the Hamming distance between itself and all sequences in

## Running Time?

Both of these algorithms are very slow; they are in fact exponential.
However, there is a simple trick we can use to leverage this: branch and bound.
Assume we are in the middle of the search, and that we keep track of the best candidate so far.
If now, we are in the middle of ranking a sequence

For instance, say we are doing the Median String problem, and start with

Note that branch and bound doesn't improve the Big-O complexity of the algorithm, but it does reduce running time significantly.

# Some Algorithms

## Alignment

There are three types of alignment: global, local and multiple.
Global alignment tries to align entire strings, and local alignments tries to align substrings, in order to get a better alignment.
Both global and local uses a scoring matrix (

### Global Alignment

We usually use DP for ths. The recurrence relation for each entry is

that is, either skip one of

As there are

### Local Alignment

For the naïve approach, we can use the recurrence from Global Alignsment, and add a new case:

### Multiple Alignment

Multiple Alignment takes multiple strings and tries to align them best. The problem is in NPC.

## Edit Distance

The Edit Distance problem finds the minimum number of "edits" to transform one string

## Longest Common Subsequence

Given two strings