Table of Contents

Chapter 1 - Embedded computing
Chapter 2 - Instruction sets
Chapter 3 - CPUs
Chapter 4 - Computing platforms
Chapter 5 - Program design and analysis
Chapter 6 - Processes and operating systems
Chapter 7 - Stuff
Chapter 8 - I$^2$C bus

‹

TDT4258: Low-Level Programming

Tags:

The devil is known by many names. This devil is also known as Energy efficient computer systems and Microcontroller system design.

Curriculum

Book	Computers as Components, 2nd or 3rd edition, chapters 1-6 + (2nd edition: Chapter 8.2.1 or 3rd edition: Chapter 8.4.4)	**
Compendium	Lab exercise compendium. All the required information of lab exercises. Includes assembly and C
Lecture slides	Some lectures go beyond the book. Papers and documents published on Blackboard
EFM32	Application Note 27: Energy Optimization	*
EFM32	Application Note 61: Energy Harvesting	*

* Not specifically listed in 2018

** Chapters from 2015 curriculum

Chapter 1 - Embedded computing

An embedded computer system is any device that includes a programmable computer but is not itself intended to be a general-purpose computer.

Embedded systems are more complex to design than normal PCs as they meet multiple design constraints. Constraints about cost, size and performance is often much stricter than elsewhere.

Characteristics of embedded computing algorithms

Embedded computing systems may provide sophisticated functionality.

Complex algorithms	Operations performed by a microprocessor may be very sophisticated
User interface	Microprocessors are frequently used to control complex user interfaces that may include multiple menus and many options (e.g. the moving maps in GPS navigation).

Properties

Reactive	must react to stimuli from environment
Time constrained	must meet real-time constraints

Embedded vs General Purpose Systems

Embedded systems	General Purpose Systems
Few applications	Broad rage of applications
Not programmable by end user	Programmable by end user
Fixed performance requirements	Higher performance is useful
Design criteria:
Cost	Cost
Power consumption	Power consumption
Worst case speed	Avarage speed

System design flow

Requirements → Specification → Architecture → Components → Integration

Left to right	Top-down design
Right to left	Bottom-up design

Summary

Embedded computing, while fun, is often very complex. This is mostly due to the strict constraints that must be satisfied. Trying to hack together a complex embedded system probably won't work, so a design process is needed. Features such as being able to perform to certain deadlines, consuming very little power, be of a certain size and more are very common in embedded systems. Both top-down and bottom-up processes of design might be needed to successfully develop an embedded system.

Chapter 2 - Instruction sets

Computer architecture

A computer usually has one of two architectures. Either von Neumann or Harvard. Both architectures has a CPU, but the memory design is different. The von Neumann architecture has one shared memory for data and programs, connected to the CPU by a bus. In this architecture the CPU has one internal register which is the program counter (PC), which keeps track of what part of the memory is run. The PC is then updated to run a different instruction from memory. Harvard on the other hand, has one memory for programs and one for data which are both connected to the CPU. One effect of the Harvard architecture is that it's harder to write self modifying programs, since the data is completely separate from the program. Harvard architecture is widely used today for one very good reason – it allows for higher performance when it comes to digital signal processing. Since there are two ports for memory management you get a higher bandwidth. It also makes it easier to move the data at the correct times.

A microcontroller is defined as a single-chip computer that includes a processor, memory and programmable I/O devices.

Assembly languages

Another axis in which one can see computer architecure relates to their instruction set and how they are executed. Many early computers had an instruction set which today is known as CISC (complex instruction set computers). This involves many instructions which can do all sorts of things like finding a substring in a string, having loads and loads of instructions with varying length and so on. Another way of making computers is known as the RISC (reduced instruction set computers). This architecture tended towards having fewer and simpler instructions. RISC machines tended to use the load/store instruction set. This means that you can't do data manipulation directly in memory, but have to load data into registers and store it back in memory when the task is done. RISC instructions were chosen so they could be efficiently pipelined in the processor and was heavily optimized. Early RISC machines substantially outperformed CISC machines, though this has narrowed slightly in recent years.

Some other characteristics of assembly languages are:

Word length	(4-bit, 8-bit, 16-bit, 32-bit...) this is the length of each block of data the processor handles as one instruction
Endianness	Little-endian vs big-endian. If the lowest-order byte should reside in the lower (little) or higher (big) bits of the word
Paralellism	Single-issue instruction, multiple-issue instruction, superscalar and VLIW

There are different assembly languages per computer architecture, but they usually share the same basic features:

One instruction appears per line.
Labels, which give names to memory locations, start in the first column.
Instructions must start in the second column or after to distinguish them from labels.
Comments run from some designated comment character to the end of the line.

Multi-instruction execution

In some cases, where multiple executions have no dependencies, the CPU can execute several instructions simultaneously. One technique to achieve this (used by desktop and laptop computers), is by using superscalar execution. A superscalar processor scans the program during execution to find sets of instructions that can be executed together. Another technique to achieve simultaneous instruction execution, is by using very long instruction word (VLIW) processors. These processors rely on the compiler to identify sets of instructions that can be executed in parallel.

Superscalar	VLIW
Execution uses a dynamic approach, where hardware on the processor does all the work.	Execution uses a static approach, where all the work is done by the compiler.
Can find parallelism that VLIW processors can't.
More expensive in both cost and energy consumption.

ARM

ARM is a family of RISC (Reduced instruction set computer) architectures. ARM instructions are written as follows:

        LDR r0,[r8] ; comment goes here
label:  ADD r4,r0,r1W

We can see that ARM instructions are written one per line, and that comments begin with a semicolon and continue to the end of the line. The label gives a name to a memory location, and comes at the beginning of the line, starting in the first column.

ARM assembly can also use C-like comments and block comments.

        LDR r0,[r8] ; comment goes here
        // and here
label:  ADD r4,r0,r1W
        /* and even
         * here
         */

Memory organization

The ARM architecture supports two basic types of data:

The standard ARM word is 32 bits long.
One word may be divided into four 8-bit bytes.

ARM is a load-store architecture, which means that data operands must first be loaded into the CPU and then stored back to main memory to save the results.

ARM data operations

ARM uses a load-store architecture for data operations. It has 16 (r0 to r15) general purpose registers, though some are often used for a specific task. r15 is used as the program counter, this should obviously not be overwritten. Another important register is the current program status register (CPSR) which holds information about arithmetic, logical or shifting operations.

The CPSR has the following useful information in its top four bits:

The negative (N) bit is set when the result is negative in two's-compliment arithmetic.
The zero (Z) bit is set if the result is zero.
The carry (C) bit is set when there is a carry out of the operation.
The overflow (V) bit is set when an arithmetic operation results in an overflow.

r11 is used as the frame pointer (fp). This register points to the end of the previous frame. A frame is a block of code that is being executed right now. To access variables within a frame you would subtract a value from the frame pointer. The concept of frames were introduced to allow for nested function calls and recursion, as well as a structured way of handling function arguments and return values. The frame pointer is technically not necessary unless the frame can be grown during the process execution.

r13 is the stack pointer, it points to the end of the frame currently being executed.

r14 is the link register, it contains the address to which to return to after a function call has completed. Now, you may be yelling at your screen, frustrated that I just said that thats what stack frames are for. Fret not, there is logic to this. The link register is an architecture feature provided by ARM with the purpose of supporting returning from function calls. The link register gets overwritten each time a branch-and-link instruction is executed, so recursion and nested function calls are out of the question. This makes the stack frame structure necessary.

Small ARM examples

Please see the book for a bigger reference to ARM assembly.

Translate this expression to assembly

NOTE: a is at -24, b at -28 and z at -44 in respect to the frame pointer.

Translate this C code to ARM. z = (a << 2) | (b & 15);

ldr r3, [fp, #-24]
lsl r2, r3, #2
ldr r3, [fp, #-28]
and r3, r3, #15
orr r3, r2, r3
str r3, [fp, #-44]

One thing to take from this example is that computing an expression with multiple parts, always start at the inner part and work your way out.

Implement this if-statement

C-code:

if (a > b) {
    a = 5;
} else {
    b = 3;
}

Assembly:

.L1:
    ;r2=a r3=b
    ldr r2, [fp, #-24]
    ldr r3, [fp, #-28]
    cmp r2, r3

    ; jump if true
    bgt .L3

    ; false block
    mov r3, #3
    str r3, [fp, #-28]
    b .L4

.L3:
    ; true block
    mov r2, #5
    str r2, [fp, #-24]

.L4:
    ; continue here

Advanced ARM features

Many ARM processors provide advanced features for less general applications.

DSP	Extensions are provided with the purpose of improved digital signal processing. For instance, MAC (multiply-and-accumulate) instructions can be utilised on 16x16 or 32x16 dataset.
SIMD	(Single Instruction Multiple Data) A single register is treated as having several smaller sets of data. The same operation is then applied to each single element.
NEON	NEON instructions are an extension on SIMD, and provides not only instructions optimized for vectors of data, but also larger registers, enabling a larger level of data parallellism.
TrustZone	Security features. A special instruction is available for entering TrustZone, a separate processor mode, allowing one to perform operations not permitted in normal mode.
Jazelle	Allows for direct execution of 8-bit Java™ bytecode instructions, removing the need for an interpreter.

SIMD Example

Two 32-bit registers are considered to consist of subsets of data, each subset a byte.

r1 = 0x00 0xfe 0x00 0xfe
r2 = 0x11 0x01 0x10 0x01

Performing a SIMD-add on the two registers would result in the following

simd_add r1, r1, r2
r1 = 0x11 0xff 0x10 0xff

ie. $$ \{0_{16}+11_{16},\quad FE_{16}+1_{16},\quad 0_{16}+11_{16},\quad FE_{16}+1_{16}\} $$

Chapter 3 - CPUs

Input and output mechanisms.

I/O-devices typically have several registers, data registers and status registers.

Data registers	Hold values that are treated as data by the device.
Status registers	Provide information about the device's operation, such as whether the current transaction has completed.

It is very common for a devices status and data registers to be mapped into the main memory. Some architectures (Intel x86) provide a seperate address space for I/O devices and special instructions for reading/writing from/to devices.

Example of writing to and reading from a memory mapped I/O device

Our device has a status register and a data register. The status register is mapped to memory address 0x40006080 and data to 0x40006084. Both registers are 32 bits in size.

Reading from the status register and writing the value to the data register in ARM Assembly (this is also a good example of how to bypass fitting 32-bit addresses into 32-bit instruction words):

STATUS    = 0x40006080

; Read from status register
LDR r0, =STATUS
LDR r1, [r0]

; Write to data register
STR r1, [r0, #0x4]

The corresponding C-code:

#define STATUS 0x40006080

// Reading
uint32_t status_val = *STATUS;

// Writing
*(++STATUS) = status_val;

The ++ operator and parantheses are only present to address the next (32-bit) address, such as in the Assembly example. Ie. 4 bytes shifted.

Busy/wait

Using busy/wait, the CPU tests the device status while the I/O transaction is in progress, and is therefore extremely inefficient. The CPU could do useful work in parallel with the I/O transaction, and to allow this, we can use interrupts.

Caveat: Although previous exams state busy/wait and polling are different names for the same technique, they are not the same. Busy-wait polling means to continuously read something (using CPU-cycles) until it responds with a specific output, while simply polling just means you reading at specific times without necessarily blocking the rest of the code from executing.

Interrupts

Using interrupts, the device can force the CPU to execute a particular piece of code, and the CPU can therefore do other work while the I/O transaction is in progress or the device has no need for the CPUs attention. This is done by using an interrupt handler routine (or device driver), that gets called when the device generates an interrupt. An interrupt controller is used for prioritisation and ordering of different kinds of interrupts. Typically, each I/O controller can generate interrupts, making a mechanism for gathering and prioritising interrupts necessary. The interrupt controller chooses which interrupt to forward to the CPU. Based on where the interrupt originated, the CPU suspends its execution and saves the current frame to the stack and calls the corresponding interrupt handler.

Supervisor mode, exceptions, and traps

Supervisor mode

Some processor architectures provide a supervisor mode. Most programs run in user mode, but some tasks are usually reserved for execution in supervisor mode. Supervisor modes are less frequently provided by e.g. DSPs and more frequently by processors intended for more complex systems, typically systems needing an OS. For instance, on systems with an MMU (memory management unit), writing to the MMU config registers can typically only be done in supervisor mode.

Exceptions

Exceptions can be viewed as a CPU generated interrupt, and is an internally detected error. If a program for instance tries to divide by zero, the CPU will generate an exception and a predefined routine will be called to handle the error. What routine to be called is, as with interrupts, defined by a vector and the order of the exceptions is defined by the exception's priority.

Traps

A trap, also known as a software interrupt, is an instruction that explicitly generates an exception condition. The most common use of a trap is to enter supervisor mode.

Memory system mechanics

The clock speeds of processors are increasing at a much higher rate than what the memory access rate is. It is therefore useful to have ways to narrow the gap between processing speeds and memory reading/writing speeds.

Caches

A cache is a small and fast memory, which is used to keep parts of the main memory for quicker look ups by the CPU. The cache contains parts of the main memory, so that the CPU can do quick look-ups of values it needs, rather than always getting the same value from the main memory (a cache look-up is often only a few ns, while a main memory look-up is typically 50 - 75 ns). Between the CPU and the cache/main memory there is a cache controller. When the CPU needs data, the cache controller requests both the cache and the main memory. If the data is in the cache (a cache hit), the main memory request is aborted. If not (a cache miss), the data is retrieved from the main memory. Writing to cache/memory requires a bit more work, as you need to update both the cache and the main memory.

L1 is a type of cache that is closest to the CPU. It is split, which means that it is divided into a part that contains instructions, and another part that contains memory.

Types of cache misses

Compulsary miss: Occurs when trying to access a memory block for the first time
Capacity: Occurs when the cache size is smaller than the current working set (the dataset the CPU is currently working with).
Conflict: Occurs when the data required was in the cache previously, but got evicted. These evictions occur because another request was mapped to the same cache line.

Cache placement policies

Direct mapped: Single cache line per set ($n \times 1$)
Fully associative: A memory block can occupy any of the cache lines ($m \times 1$)
Set associative: "Tradeoff". The cache is divided into $n$ sets and each set contains $m$ cache lines.

Further reading

Memory management and address translation.

In modern CPUs, the memory management unit (MMU) is responsible for mapping virtual, logical adresses (from tables) in the processor to the actual physical adresses in RAM.

Performance and power consumption of CPUs.

Chapter 4 - Computing platforms

This chapter talks about the three main parts of a computing platform: The microprocessor, the memory and the I/O devices. A microcontroller is a single chip that contains all these things.

A computing platform is equally dependent on both hardware and software - each needs the other to perform its function. Most embedded systems include a hardware abstraction layer (HAL) to provide a basic level of abstraction from the hardware. This is frequently used by device drivers.

Buses

A bus provides a common connection between the CPU and the memory and/or I/O devices. The word bus refers to both the physical connection and the protocol by which the connection is established. Signals transmitting on the bus contains the data itself, addresses, a clock and some control signals.

The primary task of a bus is to provide an interface to memory. Every bus has a bus master, assuring that only one process is active on the bus. In a typical bus, the CPU functions as the bus master.

Most buses utilize the four-cycle handshake, which ensures a safe connection. This works in the following way: Device 1 signals that it’s ready to transmit, device 2 signals that it’s ready to receive, device 2 signals that it has received the data, device 1 ends the transmission.

Data can be transmitted in different ways on the bus. A burst transfer refers to transmitting data repeatedly without going through all the steps required to transmit each piece of data in a separate transaction. A disconnected transfer refers to requesting the transfer, and then completing it later, freeing the processor in the meantime.

Direct Memory Access (DMA) refers to a bus operation that allows reading and writing data without the use of the CPU. This is established through a two-step process: First, the DMA controller sends a bus request to the CPU. Then, the CPU sends a bus grant to the DMA controller. When this is done, the DMA can act as a bus master. The CPU enables DMA by writing to registers in the DMA controller; when the DMA operation is complete, the CPU receives an interrupt. Since the CPU can’t access the bus while the DMA is active, cyclic scheduling of DMA requests is frequently used, meaning that the DMA occupies the bus for only a few cycles at a time.

Multiple bus configurations are used, since the purpose of the bus may differ. Smaller buses have fewer connections and less risk of being occupied. Larger buses can access more of the system. Also, the higher the speed of the bus, the higher the cost. It is therefore common to connect different buses together. Interconnected buses utilize a bridge to communicate with each other. The bridge acts as a master for one bus, and a slave for the other.

Bus bandwidth refers to the data capacity of the bus. The transfer time in clock cycles can be measured with the two following formulas:

$$ T_{basic}(N) = (D + O) \frac{N}{W} $$ (for basic transfers, where N is data size in bytes, D is clock cycles the data transfer requires, O is overhead both before and after the data, and W is the width of the set of bytes)

$$ T_{burst}(N) = (BD + O) \frac{N}{BW} $$ (for burst transfers, where N is data size in bytes, D is clock cycles the data transfer requires, O is overhead per burst, B is the number of transfers, and W is the size in bytes of each transfer)

Memory

DRAM

Dynamic random access-memory is organized as a two-dimensional array. Dynamic RAM (DRAM) is commonly used as memory in modern computing platforms. Multiple types of DRAM is available, which vary widely in speed, capacity and other capabilities. Memory for PCs are often S-/DIMMs: Single/Double In-line Memory Modules. ROM and Flash are two other types of memory that is used. Memory components interact with the CPU through a Memory controller.

To achieve parallelism in memory, we utilize channels and banks. Channels are separate connections to the CPU. Banks are separate memory blocks with its own memory arrays and addressing logic. Banks in the same part in memory share the same channel. Channels are generally more expensive than banks.

Memory blocks can have different aspect ratios, meaning the height and width can vary. If the size is constant, increasing the height shrinks the width and vice versa.

SRAM

Static random-access memory is often used for the cache. SRAM is faster than DRAM but uses more energy and area. It doesn't need to be refreshed periodically to keep holding the values like the DRAM has to.

Choosing a computing platform

When choosing a computing platform for your embedded system, you should adapt both the CPU, bus, memory and I/O devices to what suits your needs. You should also choose the right software to run on your system.

Consumer electronic devices

Consumer electronics should abide to both functional and non-functional requirements. The first are technical specifications that the system should support. The latter are things like battery life, looks and price.

Chapter 5 - Program design and analysis

Circular buffers and stream-oriented programming

Many embedded systems, especially DSP-based ones, do work on streams of data. A circular buffer is a memory efficient data structure for handling such streams. If we assume our algorithm needs a certain subset of the the data stream to process the next output, we can use a circular buffer to contain a window into the data stream at a current time. Since the size of this window doesn't change, the buffer can also be of constant size.

In practice, a circular buffer is implemented as an array and a position variable pointing to the start of the current window in the array.

Example

Data stream:

data: 1 2 3 4 5
time  --->

At time $t$ the data window consists of 1 2 3 4, at $t+1$: 2 3 4 5.

Circular buffer at $t$:

buf = 1 2 3 4

Circular buffer at $t+1$:

buf = 5 2 3 4

At time $t$ we're done with the first data segment, and can therefore discard it making room for the new data segment arriving at $t+1$.

Another important data structure for stream-oriented programming is the queue. While the circular buffer is holds a constant number of elements, the queue can be of varying size.

Models of programs

A control/data flow graph (CDFG) is used as a unifying way of modeling programs. It allows us to describe program behaviour regardless of what programming language is utilised or what platform it is supposed to run on. A typical CDFG consists of components describing both data and control operations.

Data flow graphs

A data flow graph is way of modeling data operations. In high level languages, a block of code containing no conditionals (one entry point/one exit point) is known as a basic block. DFGs allow us to model basic blocks.

The first step in modeling a basic block is ensuring it conforms to single-assignment form. In single-assignment form, each variable is only assigned to only once. To rewrite an expression not in this form, one splits each variable that is assigned to more than once into two variables.

With an expression on single-assignment form, we can represent variables as edges and operators as nodes in our DFG.

Control/data flow graphs

With our DFG in place, we can use a CDFG to model conditionals and control flow. (if/else, for-loops, while-loops, etc.) A CDFG encapsulates DFGs by representing them as data flow nodes. The other type of nodes in a CDFG are decision nodes, representing conditionals and control flow.

Assembly, linking and loading

Program generation workflow

A day in the life of a program™:

+----------------------+   +--------+   +-------------+   +---------+
|High level source code|-->|Compiler|-->|Assembly code|-->|Assembler|
+----------------------+   +--------+   +-------------+   +---------+
                                                               |
                                                               ▼
                                                         +-----------+
                                                         |Object code|
                                                         +-----------+
                                                               |
                                                               ▼
         +---------+    +------+    +-----------------+    +------+
         |Execution|<---|Loader|<---|Executable binary|<---|Linker|
         +---------+    +------+    +-----------------+    +------+

An assembler's main task is to generate binary instructions from supplied assembly code. The assembler in question must do this with respect to instruction formats and must also translate labels into actual addresses. Each assembly source file gets assembled into its own object file. Since code from one file may depend on labeled code from a different file, we need to merge each object file together in a sensible way. This is done by the linker.

Assembler

So how does the assembler actually generate object code? In short, it makes two passes through the code, the first one to generate a symbol table, and the second one to actually assemble each instruction at addresses calculated in the first pass.

For the time being, let us assume we know the starting address of the program. (Most compilers provide us with relative addressing, alleviating us from having to worry about starting addresses.) During the first pass of the code, the assembler maintains a program location counter, starting at the starting address of our program and incrementing by the size of an instruction each time it progresses a line. Each time the assembler encounters a label during this pass, it adds that label to the symbol table, with the current PLC value assosiated.

During the second pass, the assembler generates binary instructions for each line of assembly in the source file at memory locations conforming to the information in the now generated symbol table. The symbol table is later used by the linker to structure the entire executable.

Linker

The next step in the process is the linking stage. Most programs are spread across several source files and are therefore assembled into separate object files, each with their own symbol table. The linker stitches together all object files into a single executable, making sure to resolve any external references in each object file. The linker also takes a linker script, defining where in the physical memory to place program and data.

Cyclomatic complexity

A simple measure of a programs control complexity is it's cyclomatic complexity. Cyclomatic complexity is the number of linearly independent paths through the execution of a program. It is calculated as follows:

$M = E - N + 2P$

Where E is the number of edges in the programs CDFG, N is the number of nodes and P is the number of connected components.

Chapter 6 - Processes and operating systems

This chapter talks about processes and operating systems, two fundamental abstractions that allow us to build complex applications on microprocessors. Since embedded systems have certain real-time requirements, such as deadlines, real-time operating systems (RTOSs) are commonly used here.

Processes

A process is a single execution of a program, and has its own state that includes registers and memory for that process. If multiple processes share the same address space, they are called threads.

Asynchronous input refers to input that does not coincide with a timer or at a certain rate; like a button in an app that can be pressed at any time.

A multirate system refers to a system that is required to perform tasks which differ in computational speed or rate, e.g. a camera that is capable of both taking pictures and shooting video. Multirate embedded computing systems like this camera are very common.

The operating system considers a process to be in one of three basic states: waiting, ready or executing. Only one process can execute at the CPU at any time, unless you have a multicore processor.

How much of the CPU time is used doing actual processing? This is called utilization, and can be found with the following formula: $ U = \frac{\text{CPU time for useful work}}{\text{total available CPU time}} $

or more generally

$ U = \frac{C}{T} = \frac{\text{Computation time}}{\text{Period}} = \frac{\text{Active}}{\text{Active} + \text{Idle}} $

Processes in embedded systems often have timing requirements. The most common requirements are initiation time, deadline and rate. The first refers to the time it takes for a process to go from the waiting state to the ready state. The next refers to a certain time when the process must be completed; failure to do so can have fatal consequences. The last specifies how quickly a process must be initiated.

Real-time criteria

(not curriculum?)

Real-time systems can be categorized by the significance of deadlines.

Category	Missed deadline
Hard	Failure.
Firm	Tolerable. No value of result after deadline.
Soft	Value of results decrease after deadline.

The period/initiation interval refers to the time between successive process executions. Jitter refers to how much extra time we can allow the process to use in order to complete its task. Data dependencies adds an element of complexity to the timining requirements. A set of processes with data dependencies is known as a task graph.

Virtual memory

The computer's operating system, using a combination of hardware and software, maps memory addresses used by a program, called virtual addresses, into physical addresses in computer memory. Main storage, as seen by a process or task, appears as a contiguous address space or collection of contiguous segments. The operating system manages virtual address spaces and the assignment of real memory to virtual memory. Address translation hardware in the CPU, often referred to as a memory management unit (MMU), automatically translates virtual addresses to physical addresses. Software within the operating system may extend these capabilities to provide a virtual address space that can exceed the capacity of real memory and thus reference more memory than is physically present in the computer.

A benefit to virtual memory is freeing applications from having to manage a shared memory space, increased security due to memory isolation, and being able to conceptually use more memory than might be physically available, using the technique of paging.

Real-time Operating Systems

Preemptive OS-es can switch processes at any time (usually when a higher priority task is issued), while non-preemptive OS-es waits for the process to complete or for the process to yield control to the kernel. Another term for non-preemptiveness is cooperative multitasking. The kernel is the part of the OS that defines the running process and performs process switches.

Low priority tasks running on preemptive platforms may still refuse to yield control with the use of critical sections.

Context switching is another word for process switching, since the set of registers that defines a process is known as its context.

Shared resources refer to parts of the system that should only be accessed by one process at any time. If not secured properly, multiple accesses can result in race conditions, meaning that the data of a process is overwritten by another process. To prevent this, we can use semaphores, which acts as a stop sign for the other processes while a process is accessing a resource. After completion, the semaphore is removed, signaling that processes again can safely access the resource. A resource that can impose race conditions is often called a critical region.

Comment: Removed? I think this section is mixing up mutal exclusion (mutex) and semaphores, although neither is "removed" rather than respectively posted/waited locked/unlocked. Semaphores (or even recursive mutexes) are incremented and decremented based on the amount of accesses permitted.

OS performance depends on several factors. The most important are context switching time and interrupt latency, referring to the time it takes to perform context switches and interrupt service routines, respectively.

Scheduling

A scheduling policy defines how processes are selected for promotion from the ready state to the execution state, AKA context switching. Different algorithms exist for this purpose. However, we must consider scheduling overhead, which is the execution time required to perform these algorithms and also perform the actual process switch.

Here are the most used priority-based scheduling algorithms:

Round Robin

Round Robin: All the processes are only allowed to process for a predefined time quant. When this time is exhausted, the executing process is moved back to ready state and the next ready state in the queue is selected for execution. Round Robin guarantees fairness, but does not guarantee that all processes be completed.

Rate-Monotonic Scheduling

Rate-Monotonic Scheduling (RMS): Often used in RTOSs. Given a set of periodically repeating processes, each process has an execution time and a period. RMS is a static scheduling policy, meaning that each process is assigned a priority that is not changed afterwards. The process with the lowest period is assigned the highest priority. All processes are put in the ready queue at t=0, and the timer starts. The process with the highest priority starts processing for its defined execution time. Once the timer is a factor of a process’ period (1P, 2P, …, tP), the process is again set to execute. If a process with higher priority is already processing, the new process has to wait. If the active process has lower priority, it is interrupted and the new process is executed. A problem with RMS is that sometimes the processes exceeds the available CPU capacity. We can check this by adding up the total execution time for all the processes during a certain period. If the total execution time is larger than the period, no feasible scheduling can be guaranteed. When a low priority task is blocked because of overutilization it is called starvation.

Earliest Deadline First

Earliest Deadline First (EDF): Dynamic scheduling policy, assigning the highest priority to the process which has the shortest remaining time to its deadline. This should not be confused with Shortest Remaining Time (SRT), which prioritizes shorter process times.

Which is best: RMS or EDF? EDF generally ensures higher CPU utilization, but RMS is better at ensuring that the deadlines are met.

What if deadlines are constantly failing? We have three options:

Get a faster CPU
Make it so that the processes require less execution time
Extend the deadlines for the system

Interprocess Communication

In general, processes can communicate with each other in two ways: Blocking or nonblocking. The first means that the process enters the waiting state while requesting a response. The latter means that the process can continue executing while waiting for a response.

Two major styles of communicating is shared memory and message passing. Both is logically equivalent. The first means that the communication happens through reading and writing to registers on a shared resource. The latter means that a message is actively sent to another process. A FIFO queue is commonly used for this purpose.

For UNIX, signals can also be used for interprocess communication. A signal is sent directly to the other process, without passing data except for the signal itself.

A mailbox is a simple mechanism for receiving messages. Some systems integrate a mailbox into the process itself. A mailbox can contain a message and a mail ready bit. When the message is delivered, mail ready is set to true. When the process reads the message, mail ready returns to false.

Chapter 7 - Stuff

Not in the curriculum per 2015. Having section seven refer to chapter eight looked rather silly though, so I'm including this paragraph.

Chapter 8 - I$^2$C bus

The I$^2$C bus is a very common, low cost, easy to implement bus mostly used for linking microcontrollers into systems. The bus exists in two variations; the standard (supporting a transfer rate of up to 100 kbps) and the extended (transfer rate of up to 400 kbps).

An I$^2$C bus uses only two lines; a serial data line (SDL) for data and a serial clock line indicating valid data on the SDL. Each node in an I$^2$C network is connected to both lines.

On the data link layer, every device connected to an I$^2$C bus must have a unique address. In the standard I$^2$C definition, an address is seven bits, while the extended definition allows for ten bits. The zero-address 0x0000000 is used to signal a general call, signaling all devices connected to the bus.

A bus transaction consists of a series of one-byte transmissions on the SDL. Transactions are either read or write. A read transaction consists of the master transmitting the address it reads from and the corresponding slave responds with the requested data. The write transaction also transmits the address to write to, but follows it up with the data to write.

An address transmission consists of eight bits, seven for the address in question and one indicating the data direction (r/w).

So how does the SCL fit into this? A bus transaction is initiated by a start signal and ended with a stop signal.

Start signal - SCL 1, SDL 0->1
Stop signal - SCL 1->0, SDL 0->1

Written by

chrisdef Stian Jensen mariofrans Assios cristea oyvindrobertsen Ose duvholt larshb Oja andrewei kjetiaun gombos

Last updated: Wed, 9 Dec 2020 14:13:52 +0100 .