TFE02: Hardware software codesign with embedded systems
# Disclaimer
This compendium is a work in progress and needs review and possibly restructuring. The current structure is based on the ordering and naming of lectures from the 2017 semester.
# Estimation and partitioning
Estimation and partitioning is important to evaluate the quality of an architecture.
## Architecture exploration
__Architecture exploration__ is the process of mapping out the hardware composition of a system. This is typically done at an early stage and may include a choice of processors, application specific hardware (such as ASICs), reprogrammable hardware (such as FPGAs) and memories.
## Partitioning
Following architecture exploration is typically __partitioning__, where functionality of the application is grouped. The groups are mapped onto architectural components.
### Kernighan–Lin algorithm
The __Kerninghan-Lin__ minimal cut algorithm is a [hill climbing](https://en.wikipedia.org/wiki/Hill_climbing), [heuristic](https://en.wikipedia.org/wiki/Heuristic_(computer_science)) algorithm for finding [partitions of graphs](https://en.wikipedia.org/wiki/Graph_partition). It can be utilized to determine optimal distributions of functionality between hardware and software for eg. accelerators.
Read more about this algorithm over at [Wikipedia](https://en.wikipedia.org/wiki/Kernighan%E2%80%93Lin_algorithm).
## Fidelity
The term __fidelity__ is used to describe the quality of an estimation methodology, often by comparing estimated results with actual results.
We use the following formula for fidelity:
$$ \mathcal{F} = \frac{2}{n(n-1)}\sum_{i=1}^{n-1}{\sum_{j=i+1}^{n}{\mu_{ij}}}, \quad\text{where }\\
\mu_{ij} = (E_i<E_j \land M_i<M_j) \lor (E_i>E_j \land M_i>M_j),
$$
where $E$ is the estimated values and $M$ is the measured values of each solution sample.
### Example
We have made an estimation of the power of different solutions to an architecture. The model is sampled with 5 variations of implementation.
||Solution index||E ||M ||
||1 ||40||22||
||2 ||50||54||
||3 ||60||73||
||4 ||70||62||
||5 ||80||72||
$$\begin{matrix}
\mu_{12}=1&\mu_{13}=1&\mu_{14}=1&\mu_{15}=1\\
\mu_{23}=1&\mu_{24}=1&\mu_{25}=1\\
\mu_{34}=0&\mu_{35}=0\\
\mu_{45}=1
\end{matrix}$$
$$ \mathcal{F} = \frac{2}{5*4}(1+1+1+1+1+1+1+0+0+1) = 0.8 = 80 \% $$
# Multiprocessors and accelerators
# Instruction set extensions
# On-chip communication architechtures
# Implementing embedded systems
# Retargetable compilers
## Acronyms
|| IR||Intermediate representation||
||CFG||Control flow graph||
||DFG||Data flow graph||
## Loop unrolling example
__Original code:__
for (i = 0; i < 15; i++)
{
p[j] = foo(j);
}
__Modified code:__
With a loop factor $f=3$
for (i = 0; i < 15; i+=3)
{
p[j] = foo(j);
p[j+1] = foo(j+1);
p[j+2] = foo(j+2);
}
Note that loops with $N \mod f \neq 0$ must include additional lines of code for operations that do not fit in the new loop.
# High Level Synthesis