This is an old version of the compendium, written May 13, 2014, 11:13 a.m. Changes made in this revision were made by gombos. View rendered version.

TDT4258: Energy Efficient Computer Design

- Curriculum - Computers as Components, 2nd or 3rd edition - Both editions: Chapters 1-6 - 2nd edition: Chapter 8.2.1 - 3rd edition: Chapter 8.4.4 - EFM32 Application Note 27: Energy Optimization - EFM32 Application Note 61: Energy Harvesting - Compendium - Lecture slides # Chapter 1 - Embedded computing An embedded computer system is any device that includes a programmable computer but is not itself intended to be a general-purpose computer. Embedded systems are more complex to design than normal PCs as they meet multiple design constraints. Constraints about cost, size and performence is often much stricter than elsewhere. ## Characteristics of embedded computing algorithms Embedded computing systems have to provide sophisticated functionality. - Complex algorithms - Operations performed by a microprocessor may be very sophisticated - User interface - Microprocessors are frequently used to control complex user interfaces that may include multiple menus and many options (e.g. the moving maps in GPS navigation). ## Summary Embedded computing, while fun, often is very complex. This is mostly due to the strict constraints that must be satisfied. Trying to hack together a complex embedded system probably won't work, so a design process is needed. Features such as being able to perform to certain deadlines, consuming very little power, be of a certain size and more are very common in embedded systems. Both top-down and bottom-up processes of design might be needed to successfully develop an embedded system. # Chapter 2 - Instruction sets ## Computer architecture A computer usually has one of two architectures. Either von Neumann or Harvard. Both architectures has a CPU, but the memory design is different. von Neumann has one memory and one CPU connected by a bus. In this architecture the CPU has one internal register which is the __program counter (PC)__, which keeps track of what part of the memory is run. The PC is then updated to run a different instruction from memory. Harvard on the other hand has one memory for programs and one for data which are both connected to the CPU. One effect of the Harvard architecture is that it's harder to write self modifying programs, since the data is completely separate from the program. Harvard architecture is widely used today for one very good reason – it allows for higher performance when it comes to digital signal processing. Since there is two ports for for memory management you get a higher bandwidth. It also makes it easier to move the data at the correct times. ## Assembly languages Another axis in which one can see computer architecure realates to their instruction set and how they are executed. Many early computers had an instruction set which today is known as __CISC (complex instruction set computers)__. This involves many instructions which can do all sorts of things like finding a substring in a string, having loads and loads of instructions with varying length and so on. Another way of making computers is known as the __RISC (reduced instruction set computers)__. This architecture tended towards having fewer and simpler instructions.

RISC machines tended to use the __load/store__ instruction set. This means that one can not do data manipulation duirectly in memory, but must load data into registers and store it back in memory when the task is done.

RISC instructions were chosen so they could be effectivly pipelined in the processor and was heavily optimized. Early RISC machines substantially outperformed CISC machines, though this has narrowed slightly in the later years. Some other characteristics of assembly languages are: - __Word length__ (4-bit, 8-bit, 16-bit, 32-bit...) : this is the length of each block of data the processor handles as one instruction - __Little-endian__ vs __ big-endian__ : if the lowest-order byte should reside in the lower (little) or higher (big) bits of the word - __Single-issue instruction__, __multiple-issue instruction__, __superscalar__ and __VLIW__ There are different assembly languages per computer architecture, but they usually share the same basic features: - One instruction appears per line. - Labels, which give names to memory locations, start in the first column. - Instructions must start in the second column or after to distinguish them from labels. - Comments run from some designated comment character. ## Multi-instruction execution In some cases, where multiple executions have no dependencies, the CPU can execute several instructions simultaneously. One technique to achieve this (used by desktop and laptop computers), is by using superscalar execution. A superscalar processor scans the program during execution to find sets of instructions that can be executed together. Another technique to achieve simultaneous instruction execution, is by using __very long instruction word (VLIW)__ processors. These processors rely on the compiler to identify sets of instructions that can be executed in parallel. ### Superscalar vs. VLIW. - Superscalar execution uses a dynamic approach, where hardware on the processor does all the work. - VLIW execution uses a static approach, where all the work is done by the compiler. - Superscalar execution can find parallelism that VLIW processors can't. - Superscalar execution are more expensive in both cost and energy consumption. ## ARM ARM is a family of RISC (Reduced instruction set computer) architectures. ARM instructions are written as follows: LDR r0,[r8] ; comment goes here label ADD r4,r0,r1W We can see that ARM instructions are written one per line, and that comments begin with a semicolon and continue to the end of the line. The label gives a name to a memory location, and comes at the beginning of the line, starting in the first column. ### Memory organization The ARM architecture supports two basic types of data: - The standard ARM word is 32 bits long. - One word may be divided into four 8-bit bytes. ARM is a load-store architecture, which means that data operands must first be loaded into the CPU and then stored back to main memory to save the results. ### ARM data opearations And uses a load-store architecture for data operations. It has 16 (r0 to r15) general purpose registers, though some ar often used for a specific task. r15 is used as the program counter, this should obviously not be overwritten. Another important register is the __current program status register (CPSR)__ which holds information about arithmetric, logical or shifting operations. The CPSR has the following useful information in its top four bits: - The negatuve (N) bit is set when the result is negative in two's-compliment arithmetric. - The zero (Z) bit is set if the result is zero - The carry (C) bit is set when there is a carry out of the operation. - The overflow (V) bit is set when an arithmetric operation results in an overflow. r11 is used as the __framep pointer (fp)__. This register says what __frame__ the program is on. A frame is a block of code that is being executed right now. To access variables within a frame you would subtract a value from the frame pointer. ### Small ARM examples Please see the book for a bigger reference to ARM assembly. #### Translate this expression to assembly __NOTE:__ a is at -24, b at -28 and z at -44 in respect to the frame pointer. Translate this C code to ARM. `z = (a << 2) | (b & 15);` ldr r3, [fp, #-24] mov r2, r3, r3, asl #2 ldr r3, [fp, #-28] and r3, r3, #15 orr r3, r2, r3 str r3, [fp, #-44] One thing to take form this example is that computing an expression with multiple parts, always start at the inner part and work your way out. #### Implement this if-statement C-code: if (a > b) { a = 5; } else { b = 3; } Assembly: .L1: ldr r2, [fp, #-24] ldr r3, [fp, #-28] cmp r2, r3 ; jump if false bgt .L3 ; true block mov r3, #5 str r3, [fp, #-24] b .L4 ; false block .L3: mov r3, #3 str r3, [fp, #-28] .L4: ; continue here # Chapter 3 - CPUs ## Input and output mechanisms. I/O-devices typically have several registers, data registers and status registers. - Data registers hold values that are treated as data by the device. - Status registers provide information about the device's operation, such as whether the current transaction has completed. ### Busy/wait Using busy/wait, the CPU tests the device status while the I/O transaction is in progress, and is therefore extremely inefficient. The CPU could do useful work in parallel with the I/O transaction, and to allow this, we can use interrupts. ### Interrupts Using interrupts, the device can force the CPU to force execution of a particular piece of code, and the CPU can therefore work while the I/O transaction is in progress. This is done by using an interrupt handler routine (or device driver), that takes care of the device. ## Supervisor mode, exceptions, and traps. ## Memory management and address translation. ## Caches. ## Performance and power consumption of CPUs. # Chapter 4 - Computing platforms Coming soon # Chapter 5 - Program design and analysis Coming soon # Chapter 6 - Processes and operating systems Coming soon