This is an old version of the compendium, written Nov. 24, 2017, 6:18 p.m. Changes made in this revision were made by larshb. View rendered version.

TFE02: Hardware software codesign with embedded systems

# Disclaimer This compendium is a work in progress and needs review and possibly restructuring. The current structure is based on the ordering and naming of lectures from the 2017 semester. # Estimation and partitioning Estimation and partitioning is important to evaluate the quality of an architecture. ## Architecture exploration __Architecture exploration__ is the process of mapping out the hardware composition of a system. This is typically done at an early stage and may include a choice of processors, application specific hardware (such as ASICs), reprogrammable hardware (such as FPGAs) and memories. ## Partitioning Following architecture exploration is typically __partitioning__, where functionality of the application is grouped. The groups are mapped onto architectural components. ### Kernighan–Lin algorithm The __Kerninghan-Lin__ minimal cut algorithm is a [hill climbing](https://en.wikipedia.org/wiki/Hill_climbing), [heuristic](https://en.wikipedia.org/wiki/Heuristic_(computer_science)) algorithm for finding [partitions of graphs](https://en.wikipedia.org/wiki/Graph_partition). It can be utilized to determine optimal distributions of functionality between hardware and software for eg. accelerators. Read more about this algorithm over at [Wikipedia](https://en.wikipedia.org/wiki/Kernighan%E2%80%93Lin_algorithm). ## Fidelity The term __fidelity__ is used to describe the quality of an estimation methodology, often by comparing estimated results with actual results. We use the following formula for fidelity: $$ \mathcal{F} = \frac{2}{n(n-1)}\sum_{i=1}^{n-1}{\sum_{j=i+1}^{n}{\mu_{ij}}}, \quad\text{where }\\ \mu_{ij} = (E_i<E_j \land M_i<M_j) \lor (E_i>E_j \land M_i>M_j), $$ where $E$ is the estimated values and $M$ is the measured values of each solution sample. ### Example We have made an estimation of the power of different solutions to an architecture. The model is sampled with 5 variations of implementation. ||Solution index||E ||M || ||1 ||40||22|| ||2 ||50||54|| ||3 ||60||73|| ||4 ||70||62|| ||5 ||80||72|| $$\begin{matrix} \mu_{12}=1&\mu_{13}=1&\mu_{14}=1&\mu_{15}=1\\ \mu_{23}=1&\mu_{24}=1&\mu_{25}=1\\ \mu_{34}=0&\mu_{35}=0\\ \mu_{45}=1 \end{matrix}$$ $$ \mathcal{F} = \frac{2}{5*4}(1+1+1+1+1+1+1+0+0+1) = 0.8 = 80 \% $$ # Multiprocessors and accelerators # Instruction set extensions # On-chip communication architechtures # Implementing embedded systems # Retargetable compilers ## Acronyms || IR||Intermediate representation|| ||CFG||Control flow graph|| ||DFG||Data flow graph|| ## Loop unrolling example __Original code:__

$N=16$ loop iterations

for (i = 0; i < 15; i++) { p[j] = foo(j); } __Modified code:__ With a loop factor $f=3$ for (i = 0; i < 15; i+=3) { p[j] = foo(j); p[j+1] = foo(j+1); p[j+2] = foo(j+2); } Note that loops with $N \mod f \neq 0$ must include additional lines of code for operations that do not fit in the new loop. # High Level Synthesis