This is an old version of the compendium, written June 1, 2014, 7 p.m. Changes made in this revision were made by stiaje. View rendered version.

TDT4190: Distributed systems

> Better than monotonic systems. (This is "funny" if you've taken the course [TDT4205](/TDT4205)) # Characterization of Distributed Systems (Chapter 1) **Distributed systems:** Components located at networked computers which communicate and coordinate by passing messages between them. This definition has three big implications: - **Concurrency:** All computers connected in a network are able to do work at the same time. - **No global clock:** When programs are executing a task together, in a network with delays, how can they agree on what time it is? They need some kind of system to handle timing of actions. - **Independent failures:** All computer programs fail. Distributed systems can fail in even more ways than your regular non-network-connected software. When two computers are working together, and one of them stops responding, it's hard for the second one to know if the computer is dead, the network has failed, or if it's just become terribly slow. ## Why use a distributed system - Enables sharing of resources between multiple users. - Applications that allow multiple users to communicate are not possible without distributed systems. - Allows for higher scalability than a single computer. ## Challenges in a distributed system ### Heterogeneity Distributed systems may consist of many different types of networks, hardware, operating systems, programming languages and differing implementations. For instance, a computer connected to the internet through ethernet should be able to communicate with someone using Wi-Fi. Likewise, a computer running a UNIX based operating system needs to be able to exchange data with a Windows computer (although it would much rather just hang out with its Linux friends). To overcome such differences, we need standards for how data should be exchanged that needs to be agreed upon by everyone. A solution that helps disguise these differences between systems is called _middleware_. Middleware takes care of all the peculiarities of a specific system and provides a consistent API for programming against. Virtual machines are another solution to the problem of heterogeneity. A virtual machine will let you run the same code on many different kinds of machines. The JVM (Java Virtual Machine) is a very common example of this. One virtual machine needs to be implemented for each system it needs to run on, but then you only need to compile code for running on that single virtual machine. ### Openness Specifications and documentation for software interface of network components. RFCs (Request For Comments) describe how most of the internet protocols are constructed. The W3C publishes and develops standards for the web. ### Security Security in distributed systems includes: - **Confidentiality:** Avoiding unauthorized access to systems. - **Integrity:** Avoiding corruption of data. - **Availability:** Securing against DoS attacks. ### Scalability A system is scalable if it can remain efficient even under high demand. Achieving a scalable system is not trivial and involves a combination of optimizing hardware and software. Software should try not to introduce limitations (I'm looking at you IPv4), and allow the system to grow with demand. Scalability can't always be solved by doubling the amount of resources as that rarely leads to a doubling of performance. The system should have few bottlenecks, as they will lead to poor performance. They might also become a single point of failure that can take down the whole system, if all traffic needs to go through them. ### Failure handling - Detecting failures - Masking failures - Tolerating failures - Recover from failures - Redundancy ### Concurrency Making sure data remains consistent when it is accessed from more than one place at the same time. ### Transparency To ease the use of a distributed system, the system should try to hide the separation of different components. Access transparency : Allowing both local and remote resources to be accessed with identic operations. Location transparency : Accessing resources without knowing their physical network location. Concurrency transparency : Allowing multiple processes to operate on the same shared resource without interfering with each other. Replication transparency : Letting multiple instances of a resource exist achieves higher reliabillity and performance. Replication transparency lets application programmers and users ignore these details. Failure transparency : Handling failures so that applications can still complete their tasks. Mobility transparency : Letting resources remain available even though they're being moved around. Scaling transparency : Allows a system to be scaled without changing applications or the structure of the system. ### Quality of service - Reliability - Security - Performance - Adaptability # System Models (Chapter 2) - Physical models - Architectural models - Fundamental models # Remote Invocation (Chapter 5) ## Request-reply Protocols A pattern on top of the most basic message passing between entities. Request-reply protocols support synchronous two-way exchange of messages, and is used as the basis for [RPC](#remote-procedure-call-rpc) and [RMI](#remote-method-invocation-rmi). ### Operations - `doOperation`: Used by clients to execute remote operations. The one who calls `doOperation` waits until the server execute the requested operation and the answer has reached the client. - `getRequest`: Used by a server to acquire requests from clients. - `sendReply`: When the server has invoked the requested operation, it sends a reply back to the client. The client then unblocks and continues its execution. If doOperation, getRequest and sendReply are sent over UDP, they may arrive in incorrect order or even not arrive at all. Therefore, doOperation uses a timeout when waiting for a response from the server. When a operation times out, doOperation may: - Return an indication to the client that the operation has failed (this is very simple, and rarely used). - The timeout may be caused by either the request or the reply message being lost in the network. doOperation will keep sending the message again until it's certain that the reason for not getting an answer is that the server doesn't respond, rather than the message getting lost along the way. When a request message is re-sent, the server may receive it multiple times. We want the server to avoid executing an operation more than once for the same request. If the server hasn't sent a reply when it receives a duplicate request it doesn't have to do anything. It will simply send a reply when it is done. If the server has already sent an answer when it receives an identical request, it will have to re-execute the operation to achieve the same result, unless the result was saved the first time. Some servers can execute the same operation multiple times and always achieve the same result. Operations that may be executed multiple times are called idempotent. #### Request and reply over TCP TCP is considered a reliable channel, which discards the need to re-send messages, and also removes any issues with non-idempotent operations. Sometimes, however, TCP is unnecessary. It introduces overhead, and features which are often not needed. Extra messages for setting up the connection, when you are often sending very small amounts of data. #### HTTP HTTP is an example of a request-reply protocol. Client requests specify a URL which includes the hostname of the server. The server supports a fixed set of methods (GET, POST, PUT...). It also allows clients to specify what kind of content they want to receive (json/xml), and supports authenticated requests. HTTP is implemented over TCP. ## Remote Procedure Call (RPC) Functions on remote machines may be called as if they were in the local address space. The underlying RPC system will hide all the ugly details of the network such as encoding and decoding of parameters and the result of the procedure call. In RPC, you use interfaces to control what kind of communication is possible between modules. Only information available through the interface will be available. Sun RPC and SOAP are among the most common implementations of RPC. ## Remote Method Invocation (RMI) Like RPC, but for objects. It lets you refer to, and call methods on, remote machines. Java RMI is one implementation of RMI. # Peer-to-peer Systems (Chapter 10) Networks that don't separate between clients and servers, and where all nodes in the network both supply and consume data. Peer-to-peer systems can be classified into three generations. The first generation was introduced by Napster in 2001. The next generation came with file-sharing services such as Gnutella, Kazaa and BitTorrent. ## Peer-to-peer middleware The third generation introduced dedicated middleware layers which provided application-independent management of distributed resources. ### Functional requirements for peer-to-peer middleware - Allow clients to find and communicate with all individual resources that are made available to a service, even though the resources are widely distributed. - Clients should be able to add new resources and remove them when they want to. - Clients should be able to add hosts to the service, and be able to remove them. - Like other middleware, peer-to-peer middleware should provide a simple API for application programmers. ### Non-functional requirements for peer-to-peer middleware - Global scalability. - Load balancing. - Optimizing for interaction between neighbors.

- Neighbors in the P2P system aren't necessarily close in the physical network. - Placing data near users that use it most.

- Allow dynamic availability. - Routing should function even while machines are disconnecting. - Security - Anonymity ## Routing overlays Routing overlays is a distributed algorithm that is responsible for localizing nodes and objects in a peer-to-peer system.

Since all nodes can't know about all objects, the key is to ask a node that might know more than you. Routing overlays must handle search, insertion, deletion and churn. ### Unstructured - **Examples:** Gnutella (older versions), Freenet - The simplest form of p2p networks. - All nodes are equal. - Nodes may connect to any other node in the network. - No planned structure. - A node cannot know if another node has more information than itself. - **Pros:** - Simple - All nodes retain full control of their data - Nodes decide for themselves which nodes to connect to - **Cons:** - Search needs to be broadcasted - No guarantee that a search reaches every node ### Hybrids (super nodes) - **Examples:** Kazaa, Skype, Gnutella (newer versions) ### Structured - Distributed hash tables are most common - Chord, CAN, Pastry, Tapestry - Fixed structure - Nodes find their position based on ID - Normally a hash based on IP address, etc. - Tries to achieve a uniform distribution - Nodes and data are contained in the same address space - Data is stored near a node with the same ID - Lookup is usually O(log n) - ID based positioning makes search efficient and targeted - Robust against failures - **Pros:** - Lookup is fast and is guaranteed to succeed - Good routing guarantees - **Cons:** - Can only do lookup based on ID - Structure controls who stores what ### Pastry - A distributed hash table - Shared 128 bit address space for nodes and objects - Every node gets a unique ID _(GUID)_ - Every object gets a unique ID _(GUID)_ - Circular routing based on prefix - Every node has a non-complete routing table - Lookup: O(log n) - Normally uses UDP - Java-implementation is available at freepastry.org #### Connecting to a new node - The new node _X_ creates a GUID. - X finds its nearest neighbor in the pastry network. - X sends a join message to this ID. - The message is routed as a lookup. - Is received by _Z_, which is the node with the most similar ID. - Nodes which route the message further, attach part of their routing table, which then builds X's routing table. - Node _Z_ sends _X_ a copy of its leaf node list. - _X_ sends its routing table and leaf node list to all the nodes in these. ##### Example from the exercises > A node with GUID 54C contacts node AF3 and wants to join the pastry network. What happens? Node 54C finds the node which is physically closest in the network. This node sends a join message. This join message is routed as a lookup on the node 54C. The node which has the longest prefix match with 54C, in this case 54B, finally receives the message. In this particular case, the node 54B was in AF3's routing table, and was routed directly. Every node that routed the join message further added parts of their own routing table to build 54C's routing table. Node 54B (which received the join message) added its leaf node list to 54C's routing table. Finally, 54C sends its routing table and leaf node list to all nodes that are in one of them. This way, they are now able to lookup 54C. #### Disconnecting - A node _X_ which doesn't answer its neighbors is removed from the ring. - The node that discovers the failure finds another node near _X_, and asks for its routing table. - The node repairs its routing table. - The node sends out information about the failure and its new routing table to its neighbors. - The routing breaks down if I following nodes fails. (2I is the size of the routing table).

# Security (Chapter 11) *except 11.5, 11.6.3 and 11.6.4* # Distributed File Systems (Chapter 12) ## Network File System (NFS) Provides transparent access to remote files. Any computer can act as both a server and a client, and the files can be available for remote access. When reading files, NFS will only cache to RAM, and the files will not persist over a reboot. It's common practice to have some machines be dedicated servers. ## Andrew File System (AFS) The Andrew File System (henceforth abbreviated AFS) is a distributed computing enviroment originally develpoed for use as a campus computing and information system. AFS will make a local copy of the whole file when it is accessed, and store this on disk. These files will presist through reboots. This makes AFS well suited for systems where the users have a computer of their own, which they use regulary. It is however, not suited for systems where many users use many different computers, and do not use the same computer each time. ## Big Table Big table is a compressed, high performance data storage system built on Google File System and some other google services. [incomplete]