TDT4186: Operating Systems
# Introduction
# Processes
A process is an instance of a computer program that is being executed. It contains the progam code and its current memory state. A CPU can execute a single process at a time. Multiprogramming is a way for CPUs to fake concurrency by switching between different processes, often quickly. The Scheduler, using a scheduling algorithm, decides which process gets to use a CPU at any given time.
## Process control
### CPU Modes of execution
### Process interruption
||**Mechanism** || **Cause** || **Use** ||
|| Interrupt || External to the execution of the current instruction || Reaction to an async external event ||
|| Trap || Associated with the execution of the current instruction || Handling of an error or an exception condition||
|| Supervisor call || Explicit request || Call to an operating system function. ||
### Process switching
|| Clock interrupt || Time slice is up, and process is switched.||
|| I/O interrupt || OS determines that an I/O action is complete, and moves all processes which were waiting for the IO to the ready-queue. ||
## Scheduling
A scheduling algorithm can either be preemptive or non-preemptive. A non-preemptive lets active processes decide when they are done using the CPU, while a preemptive algorithm can decide to switch processes at any time, that is it can interrupt the currently running process.
### Scheduling Algorithms in Batch Systems
First-Come First-Served
: Non-preemptive. Active processes are run to completion or blockage. New or unblocked processes are put in the ready queue. Maximizes CPU usage.
Shortest Job First
: Non-preemptive. Uses statistics to make educated guesses about run-time of processes. The ready queue is a priority queue prioritizing the shortest jobs first. Minimizes turnaround.
Shortest Remaining Time Next
: Preemptive. Like Shortest Job First, except a new process may take the place of the active process if its remaining time is lower than that of the active process.
### Scheduling Algorithms in Interactive Systems
Round-Robin Scheduling
: Preemptive. Each process is assigned a time interval, or quantum. Processes that run longer than quantum are switched. Switching also occurs when the active process becomes blocked, or terminates. The quantum time must be tweaked to allow the best balance between response time and CPU utilization.
Priority Scheduling
: Preemptive. Processes are given a priority score, assigned statically or dynamically. The ready queue is a priority queue on process priority. May use quantums.
Shortest Process Next
: Preemptive. Estimates which process is shortest, and prioritizes accordingly.
Guaranteed Scheduling
: Preemptive. When there are n processes, each process gets 1/n CPU cycles.
Lottery Scheduling
: Preemptive. Prioritize processes at (optionally weighted) random.
### Scheduling Algorithms in Real Time Systems
TBD.
## Threads
### Relationship between threads and processes
A process has the following:
- Protected access to processors, other processes (so it can communicate with other processes) and I/O resources.
So what is then a thread? Well, a process can hold one or more threads. A thread is a set of instructions. This means you can have one set of instructions for checking if there is a new message in your chat client, and another set of instructions which handles you sending a message. Each of these threads has the following:
- A thread execution state
- An execution stack; what are the rest of the instructions?
- Access to memory and othe resources of the process. This is shared with the other threads of the process.
### Types of threads
There are different types of threads, as threads are higher up the "chain" than processes, we need to have some control over what the threads can access of OS functionality. There are therefore two broad-categories of thread implementations:
#### User-level threads (ULT)
With pure ULT all of the thread management is done by the application. This means that there is a threading library offered to the programmer. Here, the kernel is not even aware that there are any threading going on. The program usually starts with a single thread, then spawns threads as the program executes.
#### Kernel-level thread
Windows is an example of a KLT system.
With KLT you have the obvious advantage of the kernel being aware of the threads, which means you can let threads run on different processors. This also means that if one thread is blocked, it can schedule another thread of the same process, which in turn means that even if a part of a program is waiting for say an I/O operation, you still do other things like looking at the GUI.
Finally, KLT means you can multithread kernel routines.
On the other hand, if you want to transfer control from one thread to another of the same process you'll have to be in kernel mode, which means that ULT is much faster (as it runs in user-mode all the time).
#### Combined ULT-KLT
### Multicore and multithreading
$$\text{Speedup} = \frac{\text{time to execute program on a single processor}}{ \text{time to execute program on N parallel processors}} = \frac{1}{(1-f) + \frac{f}{N}}$$
## Concurrency
**Key terms in concurrency: **
|| Critical section || A section of code in a process which accesses shared resources while another process is in a corresponding section of code, that is also accessing that resource.||
|| Race condition || When the final result of a computation depends on which thread or process locks a resource first, and the relative timing between them. ||
### Deadlocks
A set of processes are deadlocked if each process in the set is waiting for an event that only another process in the set can cause. Four conditions must hold for there to be a deadlock:
|| Mutual exclusion || A resource can only be assigned to one or no process at once. ||
|| Hold and wait || Processes currently holding resources can request more resources. ||
|| No preemption || Processes cannot lose a resource against its will. ||
|| Circular wait || There must be a circular chain of two or more processes, each waiting for a resource held by the next member of the chain. ||
Deadlock prevention can be done by preventing one or more of the four conditions above.
Removing a deadlock after it has occurred usually entails killing processes, or doing resource rollbacks.
#### Methods for handling deadlocks
1. Allow system to enter deadlock and then recover.
2. Ensure that the system will never enter a deadlock.
3. Ignore the problem and pretend that deadlock never occur in the system.
### Starvation
### Mutual exclusion
As mentioned in the introduction for concurrency: Mutual exclusion is a requirement for processes so that their read/write-operations do not interfere with each other. If two processes read or write from/to the same location in memory at the same time, then you'll have no clue about what you end up with there.
We'll look at a few hardware implementations of mutual exclusion:
#### Interrupt disabling
#### Special machine instruction
The memory value is used as a key. The process which finds the memory value as say 0, may enter its critical section after exchanging the memory value with its own register key, which for some process may be 1. This means that for this example, the process in its critical section now has a register key 0, while the memory location has a value of 1, after the exchange. When its done it resets the memory value.
### Process synchronization
#### Semaphores
- `semWait(s)`, decrements the value
- `semSignal(s)`, increments the value.
A variant of semaphores are **binary semaphores**, which can be either 0 or 1. They have the following atomic operations:
- `semWaitB(s)`, if semaphore is zero, the processes calling is blocked. If 1 it can continue.
#### Monitors
A monitor have three characteristics:
3. Only one process may be executing in the monitor at a time
Further we may need some way to alert processes that something it's waiting for is done, so we have some methods:
- `csignal(c)`, where c is some condition. The process is resumed on that condition.
In the monitor we'll have queues for each wait-condition. When `csignal` is called on the condition, we take one process from the queue and let it do its stuff.
#### Message passing
So we want to send messages between two processes, which means one needs to send the message, and another needs to receive it. There are many implementations of this, but it's usually something similar to:
- `send(destination, message)`
- `receive(source, message)`
- Blocking send and receive: Both are required to wait until message passing is done.
- Nonblocking send and receive: Both get to continue. Danger of losing messages, as a receive can have been executed before a send.
A message may have a header with type, source and destination, length and control information, like messages passed over the Internet. Control information can give information about which instruction the message is related to, or info about other messages coming in.
**Qeueing discipline:** Here you have three options:
- FIFO
- Let receiver inspect message and select which to receive next
**Mutual exclusion:** We can implement mutual exclusion with message passing by having a mailbox for the resource. The mailbox starts with one message. The process wanting the resource receives the message, and is placed in a queue if it's not available. When the process is done with the resource, it returns the message to the mailbox.
The essential requirement of memory management is to provide a memory abstraction which allows dynamically allocating portions of memory to a process. This section details different memory management strategies.
Terms in memory management:
|| Frame || A fixed-size block of main memroy. ||
|| Page || A fixed-size block of data in secondary memory, which can be copied temporarily to a frame of main memory. ||
__Physical representation:__ Physically, memory structures are usually stored in a tree-like hierarchy, with pointers to the next nodes.
## Memory partitioning
### Fragmentation
Describes a situation where space is wasted due to some design decision. In other words: Hight fragmentation means a lot of wasted space in memory.
There are different ways to address locations in memory:
|| Logical address || A reference to a memory location independent of the current assignment of data to memory. Must be translated to physical address ||
|| Relative address || Address given relative some known point, usually a value in the register. ||
A process is given a private address space, which is a private area of memory denoted by a base address and a size limit. Processes use relative addresses which are translated to physical addresses by the OS.
### Memory management techniques
**No memory management technique:** The entire memory is addressable by every process. Usually a complete disaster when multiprogramming is involved. Therefore each process is given its own address space.
- Best fit: Place it in the block closest in size of the request.
- First-fit: Scans from top, and places it in the first block it fits in.
**Buddy system:** In this sytem, memory block are available of size $2^K$, where $ L <= K <= U $, where
$$ 2^L = \text{smallest size block that is allocated} $$
$$ 2^U = \text{largest size block that is allocated} $$
This means that $2^U$ usually is equal to the memory space available for allocation.
### Virtual Memory
To release the programmer and user of the contraints of memory, we let them perceive a much larger memory. One part is the real memory (called _real memory_), and another part which is located on disk and is called _virtual memory._
This is used to decrease swapping times
|| Virtual address || The assigned virtual address to a resource, so that it can be accessed as it ||
|| Virtual address space || Virtual storage assgined to a process ||
|| Address space || Range fo memory addresses available to a process. ||
|| Raal address || The address of a storage location in main memory. ||
**Virtual memory paging:**
**Virtual memory segmentation:**
## Page Replacement Algorithms
Now we've looked at memory management techniques which can be used. Now lets's got further, and look at how it combines with the actual swapping of processes. You see, if one process in memory is not ready to be executed, then in takes up space in memory. So you want to switch out not ready processes, with those that are ready. Those who aren't ready are dumped to secondary storage.
Swapping takes time, especially when using an HDD instead of an SSD. There are several algorithms that can do this for us, each with it's own up- and downsides.
### Not Recently Used (NRU)
Pages are given two flags: referenced and modified. Each time a page is read from, the referenced flag is set. Each time a page is written to, the modified flag is set. Every now and then (e.g. every 20 msec), a clock signal triggers the resetting of the referenced flag. Based on these flags, we can classify the pages at any given time in one of four classes:
||Class 0 || Not referenced, not modified.||
||Class 1 || Not referenced, modified.||
||Class 2 || Referenced, not modified.||
||Class 3 || Referenced, modified.||
NRU swaps out a random page from the lowest-numbered non-empty class.
### First-In, First-Out (FIFO)
Oldest page gets swapped out first.
### Least Recently Used (LRU)
Swaps out the page which has remained unused for the longest time.
### WSClock
t = some constant, e.g. 120 msec
On page fault: // A page which is not in memory is requested, it gets loaded on-the fly.
if current page is referenced:
set current page as not referened
set current page to next page
restart procedure
else:
if time difference between now and current page's time of last use is larger than t:
if current page is modified:
schedule disk write for current page
set current page to next page
restart procedure
else:
swap out current page
This means it will replace the first page which is not referenced nor modifed in the last time t. Pages before this that are referenced are marked as not referenced. Pages that are modified are scheduled for disk write.
Additionally, we must consider the case where the current page pointer completes one entire circle in the circular list of frames.
In the case where at least one write has been scheduled, the algorithm simply continues until the write is complete, at which point the newly-written page will be ready for eviction.
On the other hand, if no writes have been scheduled, the first non-modified page will be swapped out. If there are no non-modified pages, the current page gets swapped out.
# File systems
A file system is an abstraction to store, retrieve and update a set of files. A file is a logical unit of information created by a process. Information stored in a file is persistent, and can be accessed by different processes.
## File Structure
There are many different ways to logically structure a file. Here is a list of three common structures:
Byte sequence
Record sequence
: Crap. Reads and writes entire fixed-size records at once. Old, punch card-based systems used this, but it is really unpopular now.
Tree
: Tree of variable size records, allows quick lookup on a specific key. Some mainframes use this.
## File Types
Operating systems often support different file types. Here are some common file types:
|| Regular files || Good ol' regular binary files. ||
|| Directories || System files for maintaining the structure of the file system. ||
|| Character special files || IO-devices masquerading as byte-sequenced files to abstract writing and reading to and from IO devices. ||
|| Block special files || IO-devices masquerading as record-sequenced files to abstract writing and reading to and from IO devices. ||
## Directories
A file system usually organizes its files in some sort of directory organization. Here are some different ways to organize files:
|| Single-Level Directory Systems || Every file is in one large root directory. Often used on simple embedded devices. ||
|| Hierarchical Directory Systems || Allows directories to contain other directories, creating a directory tree. Tree directory systems allow for sane organization of files. ||
## File System Layout
A file system is typically stored on a partition on a disk. Every partition starts with a boot block, but other than that, layout on disk can vary wildly from file system to file system.
|| Boot block || Here, the boot block contains administrative data about the file system, in addition to the bootstrapping program which is run if an operating system attempts to boot the partition. ||
|| File allocation table || The file allocation table (FAT) contains a list of all the clusters/chunks in the file data area, together with a pointer to the next cluster for each cluster. ||
|| Root directory || The root directory is the statically allocated logical root of the file system, containing files and subdirectories. ||
|| File data area || The file data area contains the rest of the data, organized in clusters as referenced from the file allocation table. ||
# I/O
## Organization of the I/O-function
With _programmed I/O_ the processor issues an I/O-command, then busy waits until the I/O operation is completes. This is obviously ineffective.
With _interrupt-driven I/O_ the processor issues the I/O command. Then if the instruction is _nonblocking_ it can continue to execute the program, and is interrupted when the I/O operation is done. If the operation is _blocking_ then the process is put in a blocked state and has to wait for further execution as the I/O operation completes.
### Direct memory access (DMA)
With _Direct memory access (DMA)_ you have a DMA module that controls the exchange of data between main memory and the I/O-module.
In this case the processor sends a request of transfer to the DMA, and is then interrupted when the DMA is done, and the entire block is transferred.
There are three major designs you can use to implement DMA. You can either have DMA, CPU and I/O-modules on one bus, but this'll obviously not be very effective. Instead you can have one bus, but let the DMA have a direct bus tot eh I/O -devices. This may be called integrated I/O and DMA.
A third option is to have one system bus, and a seperate I/O bus between DMA and I/O devices. That is, not direct bus to each I/O, but one common bus for all I/O devices.
## I/O Buffering
I/O also has buffering. It tries to fetch and return data to some place closer to the I/O device than disk, some time before and after I/O request. This is done for two reasons:
- A program needing I/O must wait for the I/O to complete. This can be either busy wait or interrupt, but it has to wait in both cases. The I/O is relatively slow compared to many other devices, so the wait can be veyr long.
- If you directly give data to a user process it will interfere with the OS swapping of processes. The OS cannot swap a processes out of memory while it has already locked locations in memory during a block transfer.
It's important to notice that I/O devices transfer and store data diffrently. Some devices to it _block-oriented_, which means they store and transfer data in blocks. Drives, USB keys and other secondary storage does this, Other devices are _stream-oriented_, which means they transfer data in and out as a stream of bytes. Printers, communication ports (eg. LAN), mouse and keyboard and oher pointing devices does this (and mostly everything that's not secondary storage). Here you can transfer either byte- or line-at-a-time.
Now we have three ways of implementing this:
- With a _single buffer_ the OS asings a part of the main memory to the I/O operation when it's requested. It then reads a block into the buffer, then transfer it to user space, and immidately starts reading the next block. This is called _reading ahead_.
- If you have a a _double buffer_ the process can tranfer to/from two buffers at the same time. This means it can eg. read and write data at the same time.
- The third option is a _circular buffer_, which is just any more than two buffers.
By using buffering we can smooth out peaks in I/O demand, but we cannot use buffering so that a I/O device can keep up with a process indefinitely. It will increase perfomance, but it isn't perect.
## Disk scheduling
### Disk properties
Disks are very slow, especially compared to other components in the computer.
|| Seek time || time it takes to position the head at the track||
|| Rotational delay || Time it takes for the appropriate sectors to line up with the head ||
|| Access time || Sum of seek time and rotational delay ||
|| Transfer time || The time it takes to perform the read or write operation. $T = \frac{b}{rN}$, where T is transfer time, b is number of bytes to be transferred, N is number of bytes on track and r is rotation spiid (rev/sec) ||
As data is often spread over diffrent tracks (seemingly random) you cannot depend on only improving the specs of the disks. We need som extra techniques to help us.
### Disk Scheduling policies
The system uses most time on seek, as sector access involves selection of tracks at random. We need to improve this. How about helping the disk by letting the OS order the I/O requests in some way, so that the time of seek can be less?
Clever. There are several policies for this, placed in two groups . The first group uses the data request to determine which one to do first, while the other group does not depend on the request at all. We'll start with the latter group.
#### Selection according to requestor
|| Random || A benchmark that is used to compare these policies is _random scheudling_, which takes a random I/O request and passes to the disk. ||
|| First-in-first-out (FIFO) || A simple queue. New requests at the end, and do the oldest first. This one's the fairest of them all, as the oldest always goes first. It has performes well if the requests are to clustered file sectors, but it will often approximate random scheduling in perofmrance. ||
|| Priority (PRI) || Perform I/O by the priority of the process which sent the requests. This leaves control over I/O operations outside the disk queue managament, so it's not used to optimize disk utilization, but in the case other goals of the system are more important. ||
|| Last in first out (LIFO) || A simple queue. New requests at the end, and do the newest first. This means there'll be little to no arm movement (process sends more requests to same sectors after one another) , and it will increase throughput. There is the possibility of starvation though, if the request is big. ||
#### Selection according to requested item
|| Shortest service time first (SSTF) || This one selects the I/O request which will take the least movement of the disk arm, which means it will always get the minimum seek time. As the arm can go both ways, there may be a case where the distance is the same to two requested location. In that case it uses randomness to decide which way to go. It performes better than FIIFO, but it does not guarantee that average seek time is minimum (as it is greedy and just looks one step ahead). ||
|| SCAN || Arm can only go one way, and satisfies all requests in taht direction, until it reaches the last track in that direction. Prevents starvations. Performs similar to SSTF unless the request pattern is very unusual, but it does not exploit locality as well as SSTF as it is more likely to take the route most recently traversed. ||
|| SCAN with LOOK policy|| Same as SCAN, but as the arm moves in a direction it can not only stop at the last track, but also at the last request in that direction. _It looks ahead for requests._ ||
SCAN (with or without LOOK) has some problems. It favors the latest arriving job, but not only that, it also favors tracks nearest to both inner- and outermost tracks. We need to look further. These algorithms solves these issues.
|| C-SCAN (circular SCAN) || Scans in one direction, and jumps to start of track when it reaches the end. This reducsed max delay that new requests experience.. SCAN: Expected time for a scan from inner to outer track is $t$, then expected service interval for sectors t the periphery is $2t$. C-SCAN: Interval is on the order of $t + s_{\text{max}}$, where $s_{\text{max}}$ is max seek time ||
|| N-step-SCAN|| Segment queue into queues of length $N$.Queues processed one at a time using SCAN. New requests are added to a new queue. Approaches SCAN in performance for large $N$. Performes like FIFO for $N=1$. ||
|| FSCAN || Two queues. When SCAN starts, it fulfills all requests in one queue, while the other queue is empty. The empty queue takes all new requests. The new requests has to wait for the next SCAN.||
## I/O Caching
Disk cache is a buffer in main memory for disk sectors which exploits the principle of locality . When a request for a particular sector arrives, it checks if the requested data is in cache, and if it is the request is serviced by the cache. If it isn't in cache, it's loaded into cache. There's a huge design consideration when it comes to caching: Which sector in cache should be replaced?
|| **Name** || **Description** || **Implementation** ||
|| Least recently used (LRU) || Replace the least recently used, that is the block which has been in memory longest without any reference to it. || A stack of pointers. Top is the most recently used block, while the bottom is the least recently used. When a block is used, move it to the top of the stack. When you need to swap, remove the bottom one, and push the new one on the top of the stack. ||
|| Least frequently used (LFU) || Replace the least frequently used. || Have a counter on each block. Increment on use. ||
LFU may seem good, but it can be biased. Say that a block is not used for a long time, then suddenly used a lot because of the principle of locality. Then you have a huge counter for a block which may not be used for along time. What now? This lead to FBS, which combines LRU and LFU.
** Frequency-based stack (FBS)**
You have a stack as with LRU, and a count on each block as with LFU. On the top of the stack the most recently used is, while on the bottom you have the least recently used. We divide the stack in three, "New", "Middle" and "Old". Only blocks in "Old" are eglible for replacement. Blocks reference counts are not incremented on blocks in the "New" section. That is, when a block in the middle- or old section's are referenced their counts are incremented.
# Multicore and multi computers
## UMA (uniform memory access)
## NUMA (non-uniform memory access)
## Multi CPU OS implementations
|| __Master/slave__ || One CPU is responsible for giving tasks to the other CPUs. Scales badly as message passing will take up more and more time and space on the bus as the number of CPUs increase. For small scale however it's a simple and good system. ||
|| __To each their own__ || Each CPU has its own private OS( i.e. they can be different). Can share code this way. However, __a huge minus__ is the lack of shared I/O buffer if one CPU makes use of an I/O operation, meaning that other CPUs won't be able to for instance read from the I/O buffer and thereby share the workload. For this reason this model is rarely, if ever used. ||
|| __Symmetric multiprocessor (SMP)__ || Each CPU has its own fraction of the same OS, being responsible for some system tasks. Makes for a complicated OS(especially in the event of a general __n__ CPU system). Furthermore, synchronization-issues pose a significant challenge. Still, this is the most widely used system as it scales well. ||
## Synchronization
|| Check to see if memory is locked. If not, lock and then read or write, treated as an atomic operation. ||
|| Peterson's software solution. ||
## Scheduling
Which thread and which CPU to run it.
### Time sharing
Makes use of a shared priority queue (which may become a bottleneck).
|| Smart scheduling || critical threads may be set to spin on their tasks instead of being switched.||
|| Affinity scheduling || Have same CPU run the same thread again for good cache and TLB use. ||
### Space sharing
- Have related threads simultaneously.
### Gang scheduling
Related threads run at the same time across the CPUs. Useful for cooperating threads.
## Intercommunication
### Message passing
### process distribution
# Virtualization
A software-computer. Less critical if it crashes as it's just a program that can be killed and restarted
|| Hypervisor 1 || Runs on a computer without OS underneath. In the event of system calls the guest OS wil trap to the hypervisor.||
|| Hypervisor 2 || Runs on top of an OS (Virtualbox, Vmware). Runs normal instructions directly on the hardware, but will have to emulate sensitive instructions ||
|| Paravirtualization || Guest OS is modified such that sensitive instructions are replaced with syscalls to the hypervisor. ||