TDT4186: Operating Systems
An operating system (OS) is a collection of software that provides an abstract machine on top of a physical machine. The OS provides resource management. An OS makes a computer easier to use for end users and programmers. Typical services provided by an OS include storage/file systems, process/thread management, security and communications.
A process is an instance of a computer program that is being executed. It contains the program code and its current memory state. A CPU can execute a single process at a time. Multiprogramming is a way for CPUs to fake concurrency by switching between different processes, often quickly. The Scheduler, using a scheduling algorithm, decides which process gets to use a CPU at any given time.
CPU Modes of execution
There is usually two modes of the processor in which it can execute the process. The first is usually the user mode, and the second a more privileged kernel mode. User and kernel mode is separated from each other so that the system tables are safe from interference of user programs.
|Interrupt||External to the execution of the current instruction||Reaction to an async external event|
|Trap||Associated with the execution of the current instruction||Handling of an error or an exception condition|
|Supervisor call||Explicit request||Call to an operating system function.|
Conditions on which the OS takes control. CPU may switch to another process.
|Clock interrupt||Time slice is up, and process is switched.|
|Memory fault||Pointer points to something that isn't there. Issues I/O request to fetch what's missing.|
|I/O interrupt||OS determines that an I/O action is complete, and moves all processes which were waiting for the IO to the ready-queue.|
Scheduling algorithms should be fair and balanced. For batch systems (systems which executes a set of jobs without interruptions), the goal is to maximize process throughput, minimize turnaround time, and maximize CPU utilization. For interactive systems (where there is interaction between user and system), the goal is to minimize response time, and ensure proportionality in latency (meeting the user's expectations). For real time systems (systems in which jobs depend on time), the goal is to meet deadlines and behave predictably.
A scheduling algorithm can either be preemptive or non-preemptive. A non-preemptive lets active processes decide when they are done using the CPU, while a preemptive algorithm can decide to switch processes at any time, that is it can interrupt the currently running process.
Scheduling Algorithms in Batch Systems
- First-Come First-Served
- Non-preemptive. Active processes are run to completion or blockage. New or unblocked processes are put in the ready queue. Maximizes CPU usage.
- Shortest Job First
- Non-preemptive. Uses statistics to make educated guesses about run-time of processes. The ready queue is a priority queue prioritizing the shortest jobs first. Minimizes turnaround.
- Shortest Remaining Time Next
- Preemptive. Like Shortest Job First, except a new process may take the place of the active process if its remaining time is lower than that of the active process.
Scheduling Algorithms in Interactive Systems
- Round-Robin Scheduling
- Preemptive. Each process is assigned a time interval, or quantum. Processes that run longer than quantum are switched. Switching also occurs when the active process becomes blocked, or terminates. The quantum time must be tweaked to allow the best balance between response time and CPU utilization.
- Priority Scheduling
- Preemptive. Processes are given a priority score, assigned statically or dynamically. The ready queue is a priority queue on process priority. May use quantums.
- Shortest Process Next
- Preemptive. Estimates which process is shortest, and prioritizes accordingly.
- Guaranteed Scheduling
- Preemptive. When there are n processes, each process gets 1/n CPU cycles.
- Lottery Scheduling
- Preemptive. Prioritize processes at (optionally weighted) random.
Scheduling Algorithms in Real Time Systems
Rate monotonic scheduling (RMS)
RMS is used with periodic tasks. The highest priority task is the one with the shortest period. RMS is guaranteed to succeed if the following formula is satisfied:
We've talked about processes, and you might think they're cool. You might also think it would be even cooler if it could do more than one thing at the same time. Related things, like for instance a program for chatting could let you send and receive messages at the same time. Well, that's possible. Let us look at threads.
Relationship between threads and processes
So couldn't you just have two processes for the example above? Well, yes. But with threads it's both easier and faster. Let's see what the differences are.
A process has the following:
- A virtual address space that holds the process image.
- Protected access to processors, other processes (so it can communicate with other processes) and I/O resources.
So what is then a thread? Well, a process can hold one or more threads. A thread is a set of instructions. This means you can have one set of instructions for checking if there is a new message in your chat client, and another set of instructions which handles you sending a message. Each of these threads has the following:
- A thread execution state
- A saved context of the process for when it's not running, which means it has a program counter pointing at the next instruction.
- An execution stack; what are the rest of the instructions?
- Some storage for the local variables in that thread.
- Access to memory and other resources of the process. This is shared with the other threads of the process.
This means that the process itself has a process control block, and a user address space, while each thread has its own user and kernel stack, in addition to a thread control block, which holds a register, priority and related information for that thread. A single-threaded process does not need a thread control block, as it does not need to separate the instructions from others.
Why use threads? One could just use multiple processes, and let them communicate? Well, there are a few benefits of threads: It's faster! It takes less time to both create and terminate a thread than a process, as the overhead of a thread is smaller. It also takes less time to switch between threads in the same process. It's also far more efficient to communicate between threads than between processes.
Types of threads
There are different types of threads, as threads are higher up the "chain" than processes, we need to have some control over what the threads can access of OS functionality. There are therefore two broad-categories of thread implementations:
User-level threads (ULT)
With pure ULT all of the thread management is done by the application. This means that there is a threading library offered to the programmer. Here, the kernel is not even aware that there is any threading going on. The program usually starts with a single thread, then spawns threads as the program executes.
Advantages of ULT is that the thread switching does not require kernel mode from the CPU. It also means that the scheduling of the threads can be specific to each application, and that any OS can use ULT as long as it provides a threading library.
There are also disadvantages. In a typical OS many system calls are blocking, which means that when a ULT executes a system call that thread and all other threads in the process are blocked. Also, if you use pure ULT you cannot take advantage of multiprocessing, as the kernel can only assign a process to one processor at a time, and the kernel isn't aware that threre are separate sets of constrictions that could be placed on different processors.
A solution to the blocking system call problem, is jacketing. This means converting a blocking system call to a nonblocking system call through the threading library. Eg. instead of calling the system I/O routine directly, the thread calls an application-level I/O jacket routine.
This is the complete opposite. As you might have guessed, the kernel manages threading completely in a pure KLT system. There is no thread management at application level, just an API for the kernel thread facility. Windows is an example of a KLT system.
With KLT you have the obvious advantage of the kernel being aware of the threads, which means you can let threads run on different processors. This also means that if one thread is blocked, it can schedule another thread of the same process, which in turn means that even if a part of a program is waiting for say an I/O operation, you still do other things like looking at the GUI. Finally, KLT means you can multithread kernel routines.
On the other hand, if you want to transfer control from one thread to another of the same process you'll have to be in kernel mode, which means that ULT is much faster (as it runs in user-mode all the time).
So what's good about each of the two? Well, KLT has an obvious speedup as it can place threads of different cores, while the ULT does not have to mode switch. May they be combined?
Yes, they can. One implementation is to create all threads as ULTs. Then map all threads to a smaller or equal number of KLTs (all ULTs has an associated KLT, and all KLTs has one or more ULTs). Solaris has this implementation.
Multicore and multithreading
You might wonder about the possible speedup of multithreading when run on different cores? Amdahl's law gives us an expression which helps us
Key terms in concurrency:
|Atomic operation||Instructions in a set of instructions are indivisable, and an operation is atomic if the whole set executes or none at all.|
|Critical section||A section of code in a process which accesses shared resources while another process is in a corresponding section of code, that is also accessing that resource.|
|Mutual exclusion||The requirement that only one process can be in a critical section and access a shared resource at a time.|
|Livelock||Two or more processes are continuously changing their state in response to change(s) in other process(es), without doing any useful work themselves.|
|Race condition||When the final result of a computation depends on which thread or process locks a resource first, and the relative timing between them.|
|Starvation||A process (or thread) halts because it never gets to do its next instruction, eg. it has to wait indefinitely for an I/O operation|
|Deadlock||Two or more processes wait cannot proceed because they're waiting for one of the other processes to complete (eg. with an I/O operation).|
We'll first start by looking at a few challenges with concurrency, and move on to look at implementations.
A set of processes are deadlocked if each process in the set is waiting for an event that only another process in the set can cause. Four conditions must hold for there to be a deadlock:
|Mutual exclusion||A resource can only be assigned to one or no process at once.|
|Hold and wait||Processes currently holding resources can request more resources.|
|No preemption||Processes cannot lose a resource against its will.|
|Circular wait||There must be a circular chain of two or more processes, each waiting for a resource held by the next member of the chain.|
Deadlock prevention can be done by preventing one or more of the four conditions above.
Removing a deadlock after it has occurred usually entails killing processes, or doing resource rollbacks.
Methods for handling deadlocks
Deadlock Prevention Design the system in such a way that the possibility of deadlock is excluded. Can be done indirecty by preventing hold-and-wait and/or no-preemption or directly by preventing circular-wait.
Deadlock Avoidance Allows conditions 1 to 3, but assures that the deadlock point is never reached. Can be done by process initiation denial or resource allocation denial.
Deadlock Detection Determine if a deadlock exists and unlock it. Search for deadlocks periodically. Recovering startegies are (in increasing sophistication):
- Abort all deadlocked processes
- Back up each deadlocked process to some previously defined checkpoint
- Successively abort deadlocked processes until deadlock no longer exists
- Successively preempt resources until deadlock no longer exists
Starvation is a problem encountered in concurrent computing where a process is perpetually denied necessary resources to process its work.
Starvation is usually caused by an overly simplistic scheduling algorithm. For example, if a (poorly designed) multi-tasking system always switches between the first two tasks while a third never gets to run, then the third task is being starved of CPU time. The scheduling algorithm, which is part of the kernel, is supposed to allocate resources equitably; that is, the algorithm should allocate resources so that no process perpetually lacks necessary resources. Wikipedia
As mentioned in the introduction for concurrency: Mutual exclusion is a requirement for processes so that their read/write-operations do not interfere with each other. If two processes read or write from/to the same location in memory at the same time, then you'll have no clue about what you end up with there.
We'll look at a few hardware implementations of mutual exclusion:
Usually a resource can be used by a process, then the process is interrupted and thrown out of the critical section of code. The interrupt is caused by some other process wanting the resource. Stupid. Simply let the hardware disable the possibility of processor interruption before critical sections, then enable it again after the section. This only works with systems where only one process is running at a time, that is only uniprocessor systems.
Special machine instruction
So what happens if you have a multiprocessor system? We can introduce a special machine instruction, which won't work unless no other is accessing the requested resource. The foundation for this is that memory locations are unavailable when a process is accessing that location. These are the two most commonly implemented instructions:
Compare & swap You pass the instruction an address, a test value and a new value. If the current value of the address is the test value, then you replace the value with the new value. If that condition is false, you leave the addressed value unchanged. This operation is not subject to interruption.
This means that when a process calls for a change of value, it tests to see if the address it wants to work with holds the expected value. If the value is not what it expected, someone is probably working on the value right now. If for instance you have a memory location with value of 0. Then some processes
Exchange instruction This instruction takes a register address and a memory address, and exchanges the value of the address location with the register location.
The memory value is used as a key. The process which finds the memory value as say 0, may enter its critical section after exchanging the memory value with its own register key, which for some process may be 1. This means that for this example, the process in its critical section now has a register key 0, while the memory location has a value of 1, after the exchange. When its done it resets the memory value.
So what's so great about the machine-instruction approach? It's simple and you can easily verify that it works. It can also support any number of processes, on any number of processors, with any number of critical sections. There are a few disadvantages though. Both starvation and deadlock is possible, and busy wait is employed which means you spend CPU time on waiting, which is inefficient.
You might want your processes to communicate, so they can work togheter and not compete for resources. There are a few ways to do this. Let's look at them.
A semaphore is simply an integer value which is used for signaling between processes. There are three atomic operations:
- Initialization of semaphore. It's set to a nonnegative value.
semWait(s), decrements the value
semSignal(s), increments the value.
This means that processes waiting for some other processes to finish can call
s is decremented for each process, which means that the number will represent the current number of processes waiting. The processes they're waiting for can call
semSignal(s) to increment
s, and a waiting process can go on.
This means semaphores may only be used as broadcasting. Everyone with access to the semaphore can say they want to be signaled, which means that the process signaling needs some sort of control on how many signals it sends. If you want all waiting procceses to continue, you need to call
semSignal(s) for the number of times there are waiting proceses. As the semaphore is used a counter for the number of waiting processes, it's called a counting semaphore or a general semaphore.
A variant of semaphores are binary semaphores, which can be either 0 or 1. They have the following atomic operations:
- Initialization of semaphore. It's set to a nonnegative value.
semWaitB(s), if semaphore is zero, the processes calling is blocked. If 1 it can continue.
semSignalB(s), checks if any processes are blocked (which means checking if the semaphore is 0). If there are blocked process, one of them is freed. If no processes are waiting, the semaphore is set to 1.
A solution similar to the binary semaphore is the mutual exclusion lock (mutex). This is a binary flag which locks a resource. The process sets the flag to 1 at the beginning, and 0 when it's done with the resource. The difference from a binary semaphore is that the only process which can release the mutex, is the one that captured it. This means that the processes uses
semWait(s) before a critical section, and
semSignal(s) after the section.
Now as we can see, both counting and binary semaphores depend on a queue for the waiting processes. How can that be implemented? The fairest solution would be a FIFO queue. Some semaphores do not specify in which order processes are removed from its queue. These are called weak semaphores. Those that specify in which order processes are removed are called strong semaphores.
Semaphores might seem simple, and we need something more advanced. What if we encompass everything in an object? Let's do that, and call it a monitor. The monitor can have the data you want to protect for the mutex requirement as a field. It can have methods to access that data, and use those methods to protect it. There cannot be any way of accessing the data outside the monitor, and only one process may be in the monitor at once, so that only one process accesses data at any time.
A monitor has three characteristics:
- Local data variables are accessible only by the monitor's procedures and not by any external procedure
- A process enters the monitor by invoking one of its procedures
- Only one process may be executing in the monitor at a time
Further we may need some way to alert processes that something it's waiting for is done, so we have some methods:
cwait(c), where c is some condition. The process is suspended on that condition.
csignal(c), where c is some condition. The process is resumed on that condition.
In the monitor we'll have queues for each wait-condition. When
csignal is called on the condition, we take one process from the queue and let it do its stuff.
The obvious advantage of monitors over semaphores is that the programmer gets help with ensuring that the mutex requirement is fulfilled, and that the processes operate synchronized. Just imagine a banking account system with a simple counting semaphore implementation. Semaphore resource access works if all processes are programmed correctly. Monitor resource access works if the monitor(s) are programmed correctly. Which do you think is easier to verify?
So we want to send messages between two processes, which means one needs to send the message, and another needs to receive it. There are many implementations of this, but it's usually something similar to:
Synchronization: If one process is to receive a message of another, it implies some synchronization, which in turn involves some blocking and nonblocking cases. There are three possible combinations, while there are usually only two combinations implemented in a system:
- Blocking send and receive: Both are required to wait until message passing is done.
- Nonblocking send, blocking receive: The sending process can continue execution after send operation, while the receiving waits for a message to be received before continuing.
- Nonblocking send and receive: Both get to continue. Danger of losing messages, as a receive can have been executed before a send.
Addressing: With direct addressing both sender and receiver have to know the specific ID of the process. With indirect adressing you use some sort of common data structure like a queue for the messages (like a mailbox) Mailboxes can be viewed as owned by the creating process, or the OS.
Message format: The format of the messages depend on the OS implemenation, and the messaging facility and needs of the programmer. Some would prefer small messages which give less overhead but more messages, or you can have bigger overhead and fewer messages. For large data you can just reference a file where you place it.
A message may have a header with type, source and destination, length and control information, like messages passed over the Internet. Control information can give information about which instruction the message is related to, or info about other messages coming in.
Qeueing discipline: Here you have three options:
- Let receiver inspect message and select which to receive next
Mutual exclusion: We can implement mutual exclusion with message passing by having a mailbox for the resource. The mailbox starts with one message. The process wanting the resource receives the message, and is placed in a queue if it's not available. When the process is done with the resource, it returns the message to the mailbox.
The essential requirement of memory management is to provide a memory abstraction which allows dynamically allocating portions of memory to a process. This section details different memory management strategies.
Terms in memory management:
|Frame||A fixed-size block of main memory.|
|Page||A fixed-size block of data in secondary memory, which can be copied temporarily to a frame of main memory.|
|Segment||A variable length block of data in secondary memory. The entire segment can be temporarily copied to main memory (segmentation), or it can be divided into pages and they can be copied individually to main memory (combined segmentation and paging).|
Physical representation: Physically, memory structures are usually stored in a tree-like hierarchy, with pointers to the next nodes.
There are several ways you can partition memory, and each has its own strengths and drawbacks. Let's look closer at them. But first, let's look closer at some challenges the partitioning methods meets.
Describes a situation where space is wasted due to some design decision. In other words: High fragmentation means a lot of wasted space in memory.
Internal fragmentation: A process is assigned more space than it needs, so that other processes cannot utilize that memory. Eg. a process is assigned a partition of size 16 MB, but is of size 8 MB.
External fragmentation: When there are small holes in memory which are hard to utilize. An example of this is when a process is assigned a contiguous part of memory, then frees a part in the middle which is hard to utilize, as it has to be of the size freed or smaller. To overcome external fragmentation you can use compaction, which mens that the OS shifts the processes so that they are a contiguous block.
There are different ways to address locations in memory:
|Physical address||Actual address location.|
|Logical address||A reference to a memory location independent of the current assignment of data to memory. Must be translated to physical address|
|Relative address||Address given relative some known point, usually a value in the register.|
A process is given a private address space, which is a private area of memory denoted by a base address and a size limit. Processes use relative addresses which are translated to physical addresses by the OS.
Memory management techniques
No memory management technique: The entire memory is addressable by every process. Usually a complete disaster when multiprogramming is involved. Therefore each process is given its own address space.
Fixed partitioning: Main memory is divided into partitions of fixed size, and processes can be loaded into a partition of equal or greater size, and will occupy the whole partition. It's easy to implement, and has little overhead for the OS. It can be implemented with either all partitions having the same size, or with different sizes of partitions. Either way, partition sizes are set at system start. It's very inefficient use of memory though as there will be a maximum number of processes, and internal fragmentation.
Dynamic partitioning: Each process is assigned the exact space it needs. It's more efficient than fixed partioning, and solves internal fragmentation. However it introduces external fragmentations, which in turn means inefficient use of the processor as it has to use compaction to counter it.
Since it's expensive to do compaction, this cannot be done all the time. We need effective algorithms for placement. These are used to place a process in a block of equal or larger size:
- Best fit: Places it in the block closest in size of the request.
- First-fit: Scans from top, and places it in the first block it fits in.
- Next-fit: Scans from last placement location, and places it in the first block it fits in.
Best-fit is the worst one. First-fit is usually best, but often places a lot of small processes at the beginning of the address space.
Buddy system: In this sytem, memory block are available of size
This means that
The name is given because it splits the request into memory blocks which are placed individually, after som rule.
Simple paging: Here we divide the memory up in fixed sized frames, and processes in fixed size blocks. We keep a list of available frames in memory. When you want to allocate memory to a process, you create a page table for the process, and place the process blocks into frames in memory. When you want to find the data in memory, the page table holds the address of a frame. There's no external segmentation, and a small amount of internal fragmentation. Paging is invisible to the programmer.
Simple segmentation: Here we divide processes into segments, and these are loaded into dynamically sized segments of memory which do not need to be contiguous. There's a segment table for each process, which tells us where the segments are located. There's no internal fragmentation, but there is external fragmentation. Improved memory utilization compared to dynamic partitioning.
To release the programmer and user of the constraints of memory, we let them perceive a much larger memory. One part is the real memory (called real memory), and another part which is located on disk and is called virtual memory.
Some words that will help you understand this section:
|Virtual address||The assigned virtual address to a resource, so that it can be accessed as if it were a part of main memory.|
|Virtual address space||Virtual storage assigned to a process.|
|Address space||Range for memory addresses available to a process.|
|Real address||The address of a storage location in main memory.|
|Trashing||When a system spends most if its time swapping rather than executing instructions due to the swapping algorithm swapping too much out right before it's used.|
Some bits used in some of the addresses:
|P||Tells you if the addressed item is in main memory or not|
|M||Tells you if the item is modified since last time it was loaded into memory|
Virtual memory paging:
|Virtual address||Page number and offset|
|Page table entry||P and M bit, some other control bits and the frame number|
Here you take the page number and looks it up in the page table. There you find the frame. The offset and the frame is enough to find the real address.
Virtual memory segmentation:
|Virtual address||Segment number and offset|
|Segment table entry||P and M bits, some other control bits, length of the segment and the segment base|
Here you use the segment number and look it up in the segment table. There you find the segment base and length of the segment. You use the offset from the virtual offset, add it to the base and find the address.
Combined virtual memory paging and segmenation: We want to combine both virtual paging and segmentation, as both have advantages. Paging is transparent to the programmer, eliminates external fragmentations and provides efficient use of memory. As the pages are of fixed size, you can also develop sophisticated algorithms for swapping. Segmentation on the other hand is visible to the programmer, and can among other things handle growing data structres, modularity and has great support for sharing and protecting. In addition it has no internal fragmentation.
Let's combine, but how? We let segments be divided into pages. Then the system can view memory as pages, while the programmer can view memory as segments. This gives us the following address structure:
|Virtual address||Segment number, page number and offset|
|Segment table entry||Control bits, length and segment base|
|Page table entry||P, M, other control bits and frame number|
To the programmer, page number and the offset is viewed as the usual segment offset (from segmentation). The segment number and page number is used to find the segment and the page in that segment. The offset finds the location in the frame (page).
Page Replacement Algorithms
Now we've looked at memory management techniques which can be used. Now let's go further, and look at how it combines with the actual swapping of processes. You see, if one process in memory is not ready to be executed, then it takes up space in memory. So you want to switch out not ready processes, with those that are ready. Those that aren't ready are dumped to secondary storage. Which one do we dump out of main memory?
Swapping takes time, especially when using an HDD instead of an SSD. There are several algorithms that can do this for us, each with it's own up- and downsides.
Not Recently Used (NRU)
Pages are given two flags: referenced and modified. Each time a page is read from, the referenced flag is set. Each time a page is written to, the modified flag is set. Every now and then (e.g. every 20 msec), a clock signal triggers the resetting of the referenced flag. Based on these flags, we can classify the pages at any given time in one of four classes:
|Class 0||Not referenced, not modified.|
|Class 1||Not referenced, modified.|
|Class 2||Referenced, not modified.|
|Class 3||Referenced, modified.|
NRU swaps out a random page from the lowest-numbered non-empty class.
First-In, First-Out (FIFO)
Oldest page gets swapped out first.
Least Recently Used (LRU)
Swaps out the page which has remained unused for the longest time. LRU is almost as good as the optimal strategy, but is difficult to implement. Could use a stack or similar structure.
Combination of the Working Set and Clock algorithms. We regard the memory as an initially-empty cirular list of frames, and maintain a pointer to a single frame ("current page"). Each page has a referenced flag and a modified flag, as well as a 'time of last use' field. Again, eventual clock ticks reset the referenced flag. The algorithm:
t = some constant, e.g. 120 msec On page fault: // A page which is not in memory is requested, it gets loaded on-the fly. if current page is referenced: set current page as not referened set current page to next page restart procedure else: if time difference between now and current page's time of last use is larger than t: if current page is modified: schedule disk write for current page set current page to next page restart procedure else: swap out current page
This means it will replace the first page which is not referenced nor modifed in the last time t. Pages before this that are referenced are marked as not referenced. Pages that are modified are scheduled for disk write.
Additionally, we must consider the case where the current page pointer completes one entire circle in the circular list of frames.
In the case where at least one write has been scheduled, the algorithm simply continues until the write is complete, at which point the newly-written page will be ready for eviction.
On the other hand, if no writes have been scheduled, the first non-modified page will be swapped out. If there are no non-modified pages, the current page gets swapped out.
The UM-clock algorithm is a more powerful version of the clock algorithm because it uses two bits instead of one, the u bit (use bit) and m bit (modified bit). An m bit is associated with every page in main memory and hence every frame of main memory. Each frame falls into one of the four categories:
- Not accessed recently, not modified (u = 0, m = 0)
- Not accessed recently, modified (u = 0, m = 1)
- Accessed recently, not modified (u = 1, m = 0)
- Accessed recently, modified (u = 1, m = 1)
With this classification, the UM-clock algorithm performs as follows:
- Beginning at the current position of the pointer, scan the frame buffer. During this scan, make no changes to the use bit. The first frame encountered with (u = 0, m = 0) is selected for replacement.
- If step 1 fails, scan again, looking for the frame with (u = 0, m = 1). The first such frame encountered is selected for replacement. During this scan, set the use bit to 0 on each frame that is bypassed.
- If step 2 fails, the pointer should have returned to its original position and all of the frames in the set will have a use bit of 0. Repeat step 1 and, if necessary, step 2. This time, a frame will be found for the replacement.
A file system is an abstraction to store, retrieve and update a set of files. A file is a logical unit of information created by a process. Information stored in a file is persistent, and can be accessed by different processes.
There are many different ways to logically structure a file. Here is a list of three common structures:
- Byte sequence
- Maximum flexibility. The OS does not help, and does not get in the way. UNIX, MS-DOS and Windows use this.
- Record sequence
- Crap. Reads and writes entire fixed-size records at once. Old, punch card-based systems used this, but it is really unpopular now.
- Tree of variable size records, allows quick lookup on a specific key. Some mainframes use this.
Operating systems often support different file types. Here are some common file types:
|Regular files||Good ol' regular binary files.|
|Directories||System files for maintaining the structure of the file system.|
|Character special files||IO-devices masquerading as byte-sequenced files to abstract writing and reading to and from IO devices.|
|Block special files||IO-devices masquerading as record-sequenced files to abstract writing and reading to and from IO devices.|
A file system usually organizes its files in some sort of directory organization. Here are some different ways to organize files:
|Single-Level Directory Systems||Every file is in one large root directory. Often used on simple embedded devices.|
|Hierarchical Directory Systems||Allows directories to contain other directories, creating a directory tree. Tree directory systems allow for sane organization of files.|
File System Layout
A file system is typically stored on a partition on a disk. Every partition starts with a boot block, but other than that, layout on disk can vary wildly from file system to file system.
|Boot block||Here, the boot block contains administrative data about the file system, in addition to the bootstrapping program which is run if an operating system attempts to boot the partition.|
|File allocation table||The file allocation table (FAT) contains a list of all the clusters/chunks in the file data area, together with a pointer to the next cluster for each cluster.|
|Root directory||The root directory is the statically allocated logical root of the file system, containing files and subdirectories.|
|File data area||The file data area contains the rest of the data, organized in clusters as referenced from the file allocation table.|
Organization of the I/O-function
With programmed I/O the processor issues an I/O-command, then busy waits until the I/O operation is complete. This is obviously ineffective.
With interrupt-driven I/O the processor issues the I/O command. Then if the instruction is nonblocking it can continue to execute the program, and is interrupted when the I/O operation is done. If the operation is blocking then the process is put in a blocked state and has to wait for further execution as the I/O operation completes.
Direct memory access (DMA)
With Direct memory access (DMA) you have a DMA module that controls the exchange of data between main memory and the I/O-module. In this case the processor sends a request of transfer to the DMA, and is then interrupted when the DMA is done, and the entire block is transferred.
In the request, the CPU gives the DMA whether the operation is a read or write, the address of the I/O device, the starting location of memory to read/write from/to, and the number of words to be read or written.
There are three major designs you can use to implement DMA. You can either have DMA, CPU and I/O-modules on one bus, but this'll obviously not be very effective. Instead you can have one bus, but let the DMA have a direct bus to the I/O -devices. This may be called integrated I/O and DMA.
A third option is to have one system bus, and a separate I/O bus between DMA and I/O devices. That is, not direct bus to each I/O, but one common bus for all I/O devices.
I/O also has buffering. It tries to fetch and return data to some place closer to the I/O device than disk, some time before and after I/O request. This is done for two reasons:
- A program needing I/O must wait for the I/O to complete. This can be either busy wait or interrupt, but it has to wait in both cases. The I/O is relatively slow compared to many other devices, so the wait can be very long.
- If you directly give data to a user process it will interfere with the OS swapping of processes. The OS cannot swap a processes out of memory while it has already locked locations in memory during a block transfer.
It's important to notice that I/O devices transfer and store data differently. Some devices do it block-oriented, which means they store and transfer data in blocks. Drives, USB keys and other secondary storage do this. Other devices are stream-oriented, which means they transfer data in and out as a stream of bytes. Printers, communication ports (eg. LAN), mouse and keyboard and other pointing devices do this (and mostly everything that's not secondary storage). Here you can transfer either byte- or line-at-a-time.
Now we have three ways of implementing this:
- With a single buffer the OS assigns a part of the main memory to the I/O operation when it's requested. It then reads a block into the buffer, then transfers it to user space, and immediately starts reading the next block. This is called reading ahead.
- If you have a double buffer the process can transfer to/from two buffers at the same time. This means it can eg. read and write data at the same time.
- The third option is a circular buffer, which is just any more than two buffers.
By using buffering we can smooth out peaks in I/O demand, but we cannot use buffering so that an I/O device can keep up with a process indefinitely. It will increase performance, but it isn't perfect.
Disks are very slow, especially compared to other components in the computer.
|Seek time||time it takes to position the head at the track|
|Rotational delay||Time it takes for the appropriate sectors to line up with the head|
|Access time||Sum of seek time and rotational delay|
|Transfer time||The time it takes to perform the read or write operation.
As data is often spread over different tracks (seemingly random) you cannot depend on only improving the specs of the disks. We need some extra techniques to help us.
Disk Scheduling policies
The system uses most time on seek, as sector access involves selection of tracks at random. We need to improve this. How about helping the disk by letting the OS order the I/O requests in some way, so that the time of seek can be less?
Clever. There are several policies for this, placed in two groups . The first group uses the data request to determine which one to do first, while the other group does not depend on the request at all. We'll start with the latter group.
Selection according to requestor
|Random||A benchmark that is used to compare these policies is random scheudling, which takes a random I/O request and passes to the disk.|
|First-in-first-out (FIFO)||A simple queue. New requests at the end, and do the oldest first. This one's the fairest of them all, as the oldest always goes first. It performes well if the requests are to clustered file sectors, but it will often approximate random scheduling in performance.|
|Priority (PRI)||Perform I/O by the priority of the process which sent the requests. This leaves control over I/O operations outside the disk queue managament, so it's not used to optimize disk utilization, but in the case other goals of the system are more important.|
|Last in first out (LIFO)||A simple queue. New requests at the end, and do the newest first. This means there'll be little to no arm movement (process sends more requests to same sectors after one another) , and it will increase throughput. There is the possibility of starvation though, if the request is big.|
Selection according to requested item
|Shortest service time first (SSTF)||This one selects the I/O request which will take the least movement of the disk arm, which means it will always get the minimum seek time. As the arm can go both ways, there may be a case where the distance is the same to two requested location. In that case it uses randomness to decide which way to go. It performes better than FIIFO, but it does not guarantee that average seek time is minimum (as it is greedy and just looks one step ahead).|
|SCAN||Arm can only go one way, and satisfies all requests in that direction, until it reaches the last track in that direction. Prevents starvations. Performs similar to SSTF unless the request pattern is very unusual, but it does not exploit locality as well as SSTF as it is more likely to take the route most recently traversed.|
|SCAN with LOOK policy||Same as SCAN, but as the arm moves in a direction it can not only stop at the last track, but also at the last request in that direction. It looks ahead for requests.|
SCAN (with or without LOOK) has some problems. It favors the latest arriving job, but not only that, it also favors tracks nearest to both inner- and outermost tracks. We need to look further. These algorithms solves these issues.
|C-SCAN (circular SCAN)||Scans in one direction, and jumps to start of track when it reaches the end. This reducsed max delay that new requests experience.. SCAN: Expected time for a scan from inner to outer track is
|N-step-SCAN||Segment queue into queues of length
|FSCAN||Two queues. When SCAN starts, it fulfills all requests in one queue, while the other queue is empty. The empty queue takes all new requests. The new requests has to wait for the next SCAN.|
Disk cache is a buffer in main memory for disk sectors which exploits the principle of locality . When a request for a particular sector arrives, it checks if the requested data is in cache, and if it is the request is serviced by the cache. If it isn't in cache, it's loaded into cache. There's a huge design consideration when it comes to caching: Which sector in cache should be replaced?
|Least recently used (LRU)||Replace the least recently used, that is the block which has been in memory longest without any reference to it.||A stack of pointers. Top is the most recently used block, while the bottom is the least recently used. When a block is used, move it to the top of the stack. When you need to swap, remove the bottom one, and push the new one on the top of the stack.|
|Least frequently used (LFU)||Replace the least frequently used.||Have a counter on each block. Increment on use.|
LFU may seem good, but it can be biased. Say that a block is not used for a long time, then suddenly used a lot because of the principle of locality. Then you have a huge counter for a block which may not be used for along time. What now? This lead to FBS, which combines LRU and LFU.
Frequency-based stack (FBS) You have a stack as with LRU, and a count on each block as with LFU. On the top of the stack the most recently used is, while on the bottom you have the least recently used. We divide the stack in three, "New", "Middle" and "Old". Only blocks in "Old" are eglible for replacement. Blocks reference counts are not incremented on blocks in the "New" section. That is, when a block in the middle- or old section's are referenced their counts are incremented. When a new item is referenced, we set the count to 1 and put it on top of the stack (in the new section).
Multicore and multi computers
UMA (uniform memory access)
NUMA (non-uniform memory access)
Multi CPU OS implementations
|Master/slave||One CPU is responsible for giving tasks to the other CPUs. Scales badly as message passing will take up more and more time and space on the bus as the number of CPUs increase. For small scale however it's a simple and good system.|
|To each their own||Each CPU has its own private OS( i.e. they can be different). Can share code this way. However, a huge minus is the lack of shared I/O buffer if one CPU makes use of an I/O operation, meaning that other CPUs won't be able to for instance read from the I/O buffer and thereby share the workload. For this reason this model is rarely, if ever used.|
|Symmetric multiprocessor (SMP)||Each CPU has its own fraction of the same OS, being responsible for some system tasks. Makes for a complicated OS(especially in the event of a general n CPU system). Furthermore, synchronization-issues pose a significant challenge. Still, this is the most widely used system as it scales well.|
|Check to see if memory is locked. If not, lock and then read or write, treated as an atomic operation.|
|Peterson's software solution.|
Approaches to exploit true parallelism in a multiprocessor system. Determining which thread and which CPU to run it.
Each processor, when idle, selects a thread from a global ready queue. The advantages of this aproach is that it distributes the load evenly across processors, assuring no processor is idle while work is available. No centralized scheduler is required; when a processor is available, the scheduling routine of the operating system is run on that processor to select the next thread. The disadvantages is that the central queue is potentially a bottleneck, due to mutual exclusion in the area of main memory it occupies. Preempted threads (when preemption is adapted) are unlikely to resume execution on the same processor, and does not take advantage of caching (if caching is adapted). It is unlikely that all of the threads of a program will gain access to processors at the same time (if all threads are treated as a common pool). If a high degree of coordination is required, the process switches may compromise performance.
|Smart scheduling||critical threads may be set to spin on their tasks instead of being switched.|
|Affinity scheduling||Have same CPU run the same thread again for good cache and TLB use.|
A set of related threads is scheduled to run on a set of processors at the same time, on a one-to-one basis. Advantages: Less process switching; closely related processes execute in parallel and reducing synchronization blocking. Scheduling overhead are reduced, since a single decision affects a number of processes/processors at a time Disadvantages: It requires processor allocation
Dedicated processor assignment
Pool of processors. Each program is allocated to a number of processors equal to the number of threads in the program. Opposite of load sharing, an extreme version of gang scheduling were the processor assigned to any given threads is dedicated to that thread until the application is done. When there are many processors available and the system is highly parallel, the cost of switching would exceed the cost of a idle processor (in terms of performance). Advantages: Good for highly parallel systems with many processors Disadvantages: Poor for systems with few processors
The number of threads in a process can be altered during the coarse of execution. For some applications it is possible to provide language and system tools that permit the number of threads in the process to be altered dynamically. This would allow the operating system to adjust the load to improve utilization. Advantages: For applications that can take advantage of this, the approach is superior to gang scheduling and dedicated processor assignemnt Disadvantages: Overhead
A software-computer. Less critical if it crashes as it's just a program that can be killed and restarted
|Hypervisor 1||Runs on a computer without OS underneath. In the event of system calls the guest OS will trap to the hypervisor.|
|Hypervisor 2||Runs on top of an OS (Virtualbox, Vmware). Runs normal instructions directly on the hardware, but will have to emulate sensitive instructions|
|Paravirtualization||Guest OS is modified such that sensitive instructions are replaced with syscalls to the hypervisor.|