This is an old version of the compendium, written May 30, 2015, 8:17 p.m. Changes made in this revision were made by trmd. View rendered version.

TDT4145: Data Modelling and Database Systems

# Introduction: Terms and expressions || Database || A database is a collection of data. || || Data || Facts that can be recorded and that have an implicit meaning || || Miniworld or Universe of Discourse (UoD) || A database represents a part of the real world, and we call this part a miniworld or UoD. Changes in the real world is reflected in the database.|| || Database model ||A type of data model that determines the logical structure of a database. A database model can be represented in many different ways, eg. through an [ER-model](#entity-relationship-model). || || Database management System (DBMS) || A collection of programs that allows users to create and maintain a database.|| || Database system || Refers collectively to the database, database model and the DBMS. || # Entity Relationship model An Entity Relationship Model is a data model for describing the structure and organization of data ultimatly to be implemented in a database. An ER-model is usually described through an ER-diagram. The ER-model is for the most part composed of entities that represent objects or things, and relationship classes that represents the realtionship between two entities. The entities is identified with a name and attributes. ## ER-notation used in the lectures As of May 2014. Described in norwegian. __Errata:__ For "Total spesialisering", there is a typo. It should be "alle entiteter i superklassen må være med i minst én subklasse". ![ER-notation part 1](http://i.imgur.com/P8ZPGlo.png) ![ER-notation part 2](http://i.imgur.com/dIxp7Lx.png) # Relational algebra Used to describe how data can be organized into sets. It's what modern query languages is built on. A statement in relational algebra can either be given as text, or as a tree. # Database normalization Database normalization is the process of organizing the fields and tables of a relational database to minimalize redundancy and dependency. ## Normal forms (aka NF) A normal form is a set of criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies. The most common normal forms are: || 1NF || A table represents a relation and has no repeating groups. All attribute values are required to be atomic. || || 2NF || No non-prime attribute in the table is partially dependent on any candiate key. || || 3NF || Every non-prime attribute in the table is directly dependent on every candidate key. || || BCNF || Every non-trivial functional dependency is a dependency on a superkey. || Note that to meet the requirements of a normal form, a table must also meet all the requirementes of the lower normal forms. ## Functional Dependencies A functional dependency, written $ A \rightarrow B $, on some relation (table) where A and B are sets of attributes (columns) in the relation, means that if two tuples have the same value in A they must have the same value in B. $$ A \rightarrow B \Longleftrightarrow t_{n}[A]=t_{m}[A] \Rightarrow t_{n}[B]=t_{m}[B] $$ If more than one attribute appear on either side of the arrow they should be and'ed together ### Rules for functional dependencies || Reflexivity || If Y is a subset of X, then $ X \Rightarrow Y $ || || Augmentation || If $ X \Rightarrow Y$, then $XZ \Rightarrow YZ $ || || Transitivity || If $ X \Rightarrow Y $ and $ Y \Rightarrow Z$ then $ X \Rightarrow Z $ || You'll get far with these, but they can also be derived to: || Union || If $ X \Rightarrow Y $ and $ X \Rightarrow Z $, then $ X \Rightarrow YZ $|| || Decomposition || If $ X \Rightarrow YZ $, then $X \Rightarrow Y$ and $X \Rightarrow Z$|| || Pseudotransitivity ||If $ X \Rightarrow Y$ and $WY \Rightarrow Z$, then $WX \Rightarrow Z$ || || Composition || If $ X \Rightarrow Y $ and $ Z \Rightarrow W $, then $ XZ \Rightarrow YW $ || So what can these be used for? __Closure of a set of functional dependencies__ We have a set of functional dependencies F. The closure of F is a set of functional dependencies F' that are logically implied by F. In more human terms, it is the set of all possible functional dependencies that can be derived from the functional dependencies of F using the rules specified above. As well as finding the closure of a set of functional dependencies, you might want to work out the closure of just one or more attributes, using the set F. (This is the set of all things you can know, knowing only the values of this or these attributes). The algorithm can be described with the following steps: 1. Start with a set $S$, the attributes that you want to find the closure of. 2. For the set $F$ of functional dependencies, look at each functional dependency and see if you have all the left hand side attributes in $S$. If so, add all the right hand side attributes of this particular functional dependency to a set of new attributes, $N$. 3. Now, $S = S \cup N$ 4. If $N$ was empty, you have found all possible attributes that can known from the initial $S$. If $N$ was not empty, repeat from step 2 – there might still be territory to explore. __Equivalent functional dependencies__ Two sets of functional dependencies A and B are said to be equivalent if the closures of their sets of functional dependencies are the same. However, you'll not want to do this by hand on your exam (unless you possess the ability to travel back through time). The quick and easy way is to see if you can apply the rules above to derive A from B as well as B from A. If you're able to derive every functional dependency in each set from the other, they are functionally equivalent. ## Desirable properties of a relational decomposition There are four properties that a decomposition of a relation $ R $ into several relations $ R_{1}, R_{2}, … , R_{n} $ should posess or strive to attain: || __Property__ || __Explanation__ || || Preservation of attributes || All attributes of the original table should be present in at least one of the rows in the new tables so that no attribute are _lost_ || || Preservation of functional dependencies || All functional dependencies of the original table should be present. (This is not always possible when decomposing into higher normal forms.) || || Lossless join || Also called the non additive property. When the tables are joined, no additional tuples should be generated || || Highest possible normal form || The individual relations of decomposition should attain the highest normal form possible || The properties are descibred in detail in chapter 15.2 in Elmasri/Navathe. ### Nonadditive join test for a binary decomposition ### A binary decomposition is a decomposition of one relation into two (and only two) relations. As mentioned above, a nonadditive (or lossless) join is a join in which no additional tuples are generated. (In other words, this is really a test for whether it is possible to generate the original table from the two new) $F^{+}$ is the closure of the functional dependencies of the original relation $R$, which has been decomposed into the two relations $R_{1}$ and $R_{2}$ – a binary decomposition. The binary decomposition has the lossless join property if either: - The functional dependency $((R_{1} \cap R_{2}) \rightarrow (R_{1} - R_{2}))$ is in $F^{+}$, or - The functional dependency $((R_{1} \cap R_{2}) \rightarrow (R_{2} - R_{1}))$ is in $F^{+}$. ## Keys || Superkey || A set of attributes of a relation schema that are unique for each tuple. I.e. if S is a superkey of the relation schema R, then no two tuples in R will have the same combination of values in S: $ t_{n}, t_{m}\in R $, $ n\neq m $, $ t_{n}[S]\neq t_{m}[S] $ || || Key || A minimal superkey: removing one of the key's attributes results in it ceasing to be a superkey. || || Candidate key || If a relation has more than one key, they are called candidate keys: candidate keys are a set of attributes that qualify as a key. || || Primary key || Whichever candidate key is arbitrarily chosen as a relation's key. || || Prime attribute || Prime attributes are attributes that are part of one or more candidate keys. || # Types of failures A database system may fail, and there are different types of failures. Failures of type 1-4 are common, and the system is expected to automatically recover from these. Failures of type 5 and 6 are less common, and recovering from them can be a major undertaking. 1. A computer failure / crash 2. A transaction or system error 3. Local errors or exception conditions detected by the transaction 4. Concurrency control enforcement 5. Disk failure 6. Physical problems and catastrophes # Database storage stuctures Databases are typically stored on secondary (non-volatile) storage. ## List of storage structures || Heap file || The simplest storage structure: an unordered list of records. Records are inserted at the end of the heap file, giving O(1) insertion. Retrieval requires a linear search ( O(n) ), and is therefore quite inefficient for large values of n. Deleted records are only marked as deleted, and not actually removed. This is O(1) (in addition to the time it takes to retrieve a record to be deleted). Highly volatile heap-files must be re-structured often to prune away records that are marked as deleted. || || Hash buckets || Hash functions directly calculate the page address of a record based on values in the record. A good hashing function has an even spread across the address space. When doing calculations on hash buckets (say, for an exam), it is safe to assume about a 60% fill rate. Ignoring collisions, hash buckets are O(1) for retrieval, insertion, modification and deletion. Hash buckets often suck for range retrieval. || || B+-trees || A pretty good all-around, tree, storage structure. It's always balanced, and the indexed keys are in sorted order. It is usually safe to assume they are of height 3 or 4. Retrieval, insertion, modification and deletion is O(log n). Good for volatile data, as the index dynamically expands and contracts as the data set grows and shrinks. Less optimal for non-volatile/stable files - use ISAM for that. || || ISAM || (Indexed Sequential Access Method) || ## Index and storage concepts First some vocabulary: || Index field || A field or attribute of the record/row that is indexed || || Primary index || An index structure that indexes the primary key of the records.|| || Secondary index || An index which is not a primary index.|| || Clustered index|| Index on a table. The record are stored physically in the same order as the index records. In practice we either use a B+-tree or hash index. || The point of an index is to make searching faster. The index points to where the actual data for the key is located. So now that we have the structures and some vocabulary, let's look at some ways to implement storage: * __Clustered B+-tree/clustered index__: We use a B+-tree index on the primary key. Storage of data is done on the leaf-level in the tree (thus clustered index). * __Heap file and B+-tree__: The table is stored in a heap file, and we use a B+-tree index on the primary key. * __Clustered hash index__: Hash index on the primary key. Clusered index storing the actual records of the table. ### Hashing schemes #### Static hashing schemes These have a fixed number of buckets (static). __Internal hashing__ is for hashing internal files. It's implemented as a hash table through the use of an array of records, eg. an array from index $0$ to $M-1$. It then has M slots. We apply a hash function $h(k)$ to the keys. The hash function should give a number between 0 and M. $h(k)= k mod m$ is often used. Two keys hashing to the same bucket is called a collision. This can be solved by: * Open addressing: if there's a collision, you should go through the index 'till you find a empty bucket. * Chaining: We extend the array with a number of overflow locations, and add a pointer to each record location. If there's a collision we place it in an unused overflow position, and set the pointer of the occupied hash adress to the overflow location. * Multiple hashing: if there's a collision we use another hash function and hash to another bucket. If the second hash function also leads to collision then open addressing is used. __External hashing for disk files__: Target space is made of buckets, each of which holds multiple records. We hash to one bucket. For handling collisions we use a version of chaining where each bucket holds a pointer to a linked list used for overflow records for that bucket. #### Extendible hashing Extendible hashing treats a hash as a bit string, and uses a trie for bucket lookup. Re-hashing is done incrementally, one bucket at a time, as needed. Both LSB and MSB implementations are common. #### Linear hashing You start with $M-1$ buckets and a hashing function $h_{i}(k) = k mod M$. On collision the buckets are split, so you have $M$ buckets. A new hashing function $h_{i+1}(k) = k mod 2M$ is deployed. You also increment a value $n$ that counts the number of splits you've done. Initally $n=0$. On retrieval you first try $h_{i}(k)$, and if $h_{i}(k)<n$ the bucket's already been split, and you apply $h_{i+1}(k)$ instead. Over time you'll have $2M$ buckets, and all buckets will use $h_{i+1}$. Linear hashing allows for expansion of the hash table one slot at a time, round-robin. #### Pros & cons with hashing-schemes || __Hashing__ || __Pros__ || __Cons__ || || Static || If you know the number of posts || Chaining overflow || || Extendible || Dynamic data-input, splits one block at a time || Doubles the catalog on each split || || Linear || Clustered (no catalog), Dynamic || Uncontrolled splitting || ### B+ trees A B+-tree is a storage structure that is constructed as an $n$-ary tree. We expect that you already have some basic knowledge about trees. In the tree, each node (including leaves) corresponds to one disk block. #### Indexing of a B+ tree The way the indexing of the B+-trees works, here demonstrated with an unclustered B+-tree: - All the values of the index field, e.g. the numbers from 0 to 99, are stored sorted at the leaf level. Let's imagine that there is room for 10 index values in each of these leaves, and that each leaf node is completely full. Each index value in the leaf level points to the place where the data of the post with this index value is stored. In addition to this, at the end of each leaf node, there is a pointer to the next node, which makes it possible to iterate through the leaf level. - In a clustered B+ tree, there is no data pointers at the leaf level, but instead the record that the value points to. If the records are larger than data pointers, this leads to leaf nodes having a lower capacity than internal nodes. - The nodes at the levels above the leaf nodes serve as an index of the level below. The way this works, is that each value $V_i$ ($0 < i \le n$) is stored with data pointers to its left and right, $P_{i-1}$ and $P_i$ respectively. The left pointer points to a node where all values are less than the value of $V$, and the right pointer points to a node with values greater than or equal to it. In our example, these are leaf nodes, but they might be another internal node. An internal node would look like this: $<P_0, V_1, P_1, \ldots , P_{n-1}, V_n, P_n>$. - In our example, the internal node would look like this: $<P_0, 10, P_1, 20, P_2, \ldots, P_8, 90, P_9>$. Usually, when looking for some value, we start at the root level. Let's say that the index field is equal to the employee number of some employees, and we want to look for "Douglas", who has employee number 42. We start at the root node (which in our example is the only internal node), going through the values. We continue until we find the numbers 40 and 50 with a data pointer inbetween them. Because $40 \le 42 < 50$, we follow the data pointer and come to a leaf node. We look through the root node until we find 42 and its data pointer, which points to employee 42's record. It's important to remember that in a B+-tree a block should be at least 50% filled. On average a block will be 67% filled. Some operate on 69% fill degree. #### Constructing and inserting into a B+ tree Before reading this, it is important that you understand the [indexing of a B+ tree](#indexing-of-a-b-tree). We will now have to elaborate on the differences between the different kinds of nodes: - The __root node__ is at the top of the tree, and is the node at which we start our searches – both for lookup and insertion. A root node is no different from an internal node in its contents, but it is has a certain meaning when inserting. - The __internal nodes__ are the nodes that serve as indices for the leaf nodes. - The __leaf nodes__ are at the bottom level of the tree. They store all occurences of the index values with the actual data (or a pointer to them in the unclustered case) connected to them. All leaf nodes have a pointer to the next leaf node. For this explanation, we will be concerned with the placement of the indices. We take it for granted that you understand where data pointers in internal and leaf nodes are located. The algorithm for insertion of a new value $V$ is as follows: 1. Perform a search to determine which leaf node should contain $V$. 2. If the leaf node is _not full_, insert $V$. You will not have to follow any more steps. 3. If the leaf node is _full_, it must be split: 1. Split the node at the middle, creating one new node for the elements right of the middle. If there is an odd number of records in the node, you have to decide on whether your algorithm will put the middle element in the left or right node. Both options are fine, but because algorithms aren't random, you _have to_ stick with it. 2. Insert the new leaf's smallest key as well as a pointer to it in the node above (the parent). 4. If there is no room for the new key in the internal node, we have to do a in _internal split_. 1. An _internal split_ is equal to the leaf node split, _except_: When we split an internal node in two, we find the middle element. (Now, if the number of elements is even, make a decision and stick with it.) _Move_ the middle element up to the level above. This means that the element will only occur at the level above, not the level below (as opposed to leaf node splitting). 2. Continue this process of internal splitting until you reach a parent node that needs not be splitted. 3. You might not reach a parent node that doesn't have to be splitted: The root node can be full, and you have to do a _root node split_. This isn't as hard as it sounds; it only means that the tree will grow in height. It's as simple as doing an internal split of the root node, moving the middle element up to a newly constructed node, which becomes the new root node. _Constructing the tree_ is no harder than simply starting with one leaf node (which is also, technically, the root node), and following this algorithm rigorously. Create the first internal node when you have to split the root node. When drawing this, it is obligatory that you draw arrows for the data pointers in the tree. # Transactions A transaction is a unit of work performed within a DBMS. They provide reliability and isolation. Transactions are "all or nothing", in the sense that the entire unit of work must complete without error, or have no effect at all. It is important to note that a transaction is the _low-level_ execution of some database query, described purely by reading and writing values (with more advanced operations that we'll come to discuss). This sequence of operations might represent an operation such as "Update employee John's salary" or "Find the artist named Justin Bieber and remove him from my library". However, because it is low-level (opposed to the high-level SQL-query that will represent this), you won't see the actual intention when looking at the transaction; you'll only see the read, write and other low-level operations involved. ## ACID: Atomicitiy, consistency, isolation and durability Transactions must conform to the ACID principles: || Atomicity || A unit of work cannot be split into smaller parts—a unit of work can either complete entirely, or fail entirely. || || Consistency || A transaction always leaves the database in a valid state upon completion, with all data valid according to defined rules such as constraints, triggers, cascades etc. || || Isolation || No transaction can interfere with another transaction. || || Durability || When a transaction completes, it remains complete, even in the case of power loss, crashes etc. || A typical transaction follows the following lifecycle: 1. Begin the transaction 2. Execute a set of data manipulations/queries 3. If no errors occur, then commit the transaction and end it 4. If errors occur, then rollback the transaction and end it ## Schedule (or History) Schedules are lists of operations (usually data reads/writes, locks, commits etc) ordered by time, eg. $$ S_{a}: r_1(X); w_1(X); C_1; $$ S is the name of the schedule. An $r_{n}$ is a read operation done by transaction n, and $w_{n}$ is a write operation done by transaction n, which in this case. A $c$ means commit (the transaction can be written to disk), $a$ means abort (the changes made by the transaction can be rolled back). Schedules can be classified into different classes, based on either serializability or recoverability. It is important to note that these classifications are completely independent – that is, knowing only the recoverability of a schedule, tells us nothing about the serializability and vice versa. ### Recoverability of a schedule ### All schedules can be grouped into either _nonrecoverable_ or _recoverable_ schedules. A schedule is recoverable if it is never necessary to roll back a transaction that has been committed. In the group of recoverable schedules, we have a subgroup: Schedules that _Avoid Cascading Aborts_ (ACA), and this again has a subgroup: _Strict_ schedules. $$ Recoverable \supset ACA \supset Strict $$ Here are the different levels of recoverability: || Recoverable || A schedule is recoverable if every transaction commits after the transactions it has read from. "Read from" refers to reading a value that another transaction has written to. E.g $S: w_1(A), r_2(A), c_1, c_2$. If a schedule fails to meet this criterium, it is _nonrecoverable_.|| || Avoiding Cascading Abort (ACA) || When one transaction has changed one or more values and must be aborted, transactions that have read from this value must also be aborted. To avoid this, transactions can only read values from committed transactions. However, it is completely okay to write to an item that has been written to by another uncommitted transaction. For example: $S: w_1(X), r_2(Y), w_1(X), c_1, r_2(X), c_2$|| || Strict || A strict schedule doesn't allow any reads from or writes to values that have been changed by an uncommitted transaction; such a schedule simplifies the recovery process. For example: $S: w_1(X), r_2(Y), c_1, w_2(X), c_2$ || ### Serializability of a schedule ### Ideally, every transaction should be executed from start to end (commit or abort) without any other transactions active (a _serial_ schedule). However, this is a luxury we cannot afford when working with databases, mainly because a transaction spends a lot of time waiting for data being read from the disk. Therefore, transactions are executed in an interleaved fashion: When there is available time, another transaction jumps in and does some of the work it wants to do. It is apparent that interleaving operations introduces some new problems. For instance: $A$ is your bank account. Transaction $T_1$ reads the amount in $A$ and adds 500 dollars while $T_2$ reads the amount in the account and adds interest (i.e. multiplies the value by some number). As you can imagine, reordering the operations might affect the end result (if $T_2$ reads while $T_1$ prepares to add the 500 dollars, the result of $T_1$ will be forgotten when $T_2$ writes its values). If we can interleave the operations so that the end result is guaranteed to be the same as if they execute in a serial way, the schedule is called _serializable_. In short: || Serial || Transactions are non-interleaved. No transaction starts before the previous one has ended. Example: D = R1(X);W1(X);C1;W2(X);C2. || || Serializable || Schedules that have equivalent outcomes to a serial schedule are serializable. A schedule is conflict serializable if and only if its precedence graph is acyclic (see below).|| This course only looks at _conflict serializability_. If you are interested in what this actually means, consult the book; you only have to be able to test for conflict serializability to pass this course. The best test for conflict serializability is constructing a _precedence graph_. Before looking at the precedence graph, it is necessary to define the concept of a _conflict_ between transactions. Two operations are in conflict if all of these criteria are met: 1. They belong to different transactions. 2. They access the same item X. 3. _At least one of them_ is a write operation. For a schedule $S$, the precedence graph is constructed with the following steps: 1. For each transaction $T_{i}$ in $S$, draw a node and label it with its transaction number. 2. Go through the schedule and find all conflicts (see criteria above). For each conflict, draw an edge in the graph, going from the transaction that accessed the item first to the transaction that accessed the item last in the conflict. If there are no cycles in the graph, the schedule is conflict serializable. ## Types of concurrency problems Multiple transactions executing with their operations interleaved might introduce problems. Here are the four different kinds; the first three definitions are taken from the Wikipedia article on [concurrency control](https://en.wikipedia.org/wiki/Concurrency_control#Why_is_concurrency_control_needed.3F) (as of May 24th, 2014). || __Dirty read__ || Transactions read a value written by a transaction that has been later aborted. This value disappears from the database upon abort, and should not have been read by any transaction ("dirty read"). The reading transactions end with incorrect results. || || __Lost update__ || A second transaction writes a second value of a data-item on top of a first value written by a first concurrent transaction, and the first value is lost to other transactions running concurrently which need, by their precedence, to read the first value. The transactions that have read the wrong value end with incorrect results. (Wikipedia, Concurrency control) || || __Incorrect summary__ || While one transaction takes a summary over the values of all the instances of a repeated data-item, a second transaction updates some instances of that data-item. The resulting summary does not reflect a correct result for any (usually needed for correctness) precedence order between the two transactions (if one is executed before the other), but rather some random result, depending on the timing of the updates, and whether certain update results have been included in the summary or not. || || __Unrepeatable read__ || Transaction $T_1$ reads value A. Another transaction $T_2$ writes to A before $T_1$ reads from it another time, but the value of A has (unexplicably, to $T_1$) changed – the read is thus unrepeatable. || Another problem that is common, but not usually grouped together with these is __phantoms__. A phantom is a value in a group that is changed while some other transaction does work on this group. For example: $T_1$ wants to find the average salary of all employees working in the department with name "Funny hats", and it reads all of the values twice (for its own mysterious reasons). $T_2$ wants to find the employee named "Oliver Klozoff", who works in the "Funny hats" department, and lower his salary by one dollar. If $T_1$ reads all values, including Oliver's salary, and $T_2$ then jumps in and changes the salary, his salary is now a _phantom value_. When $T_1$ returns and reads all of the values again – discovering that one of the values has changed. A _phantom read_ has occured. ### The isolation levels of SQL SQL has different isolation levels that are used to combat these problems, each employing some of the techniques described in other parts of this article. What you should be concerned with are what problems _might_ appear when using the different isolation levels. In MySQL, the Serializable level is default. || __Isolation level__ || __Dirty read__ || __Unrepeatable read__ || __Phantoms__ || || Read uncommitted || Yes || Yes || Yes || || Read committed || || Yes || Yes || || Repeatable read || || || Yes || || Serializable || || || || ## Protocols for concurrency control Concurrency control protocols are sets of rules that guarantee serializability, which means that it guarantees that the transactions produces the same result as if they were executed serially without overlapping in time. 2PL (see below) is the only protocol that's a part of the curriculum. ### Two-phase locking (2PL) protocols A lock is a variable associated with a data item. A lock is basically a transaction's way of saying to other transactions "I'm reading this data item; don't touch" or "I'm writing to this data; don't even look at it". || __Binary locks__ || A binary lock on an item can have two states: Locked or unlocked (1 or 0). If the lock on an item is 1, then that data item cannot be accessed by another transaction. When a transaction starts to operate on an item, it locks it. When it's done, it unlocks the item. If a transaction wants to do operations on a locked item, it waits until it is unlocked (or aborts or recommences at some other time). Binary locks are not usually used, because they limit concurrency.|| || __Shared/executive (or Read/Write) locks__ || As said, binary locking is too restrictive. This leads ut to the multi-mode locks. Here there are three locking operations that can be done on a data item X: read_lock(X), write_lock(X) and unlock(X). This means there are three possible states for the data item: _read-locked_ (also called _share-locked_), _write-locked_ (also called _exclusive-locked_) and _unlocked_. || By making sure that the locking operations occur in _two phases_, we can guarantee serializable schedules, and thus we get two-phase locking protocols. The two phases are 1. _Expanding (growing)_: New locks on items can be acquired but none can be released. (If the variation allows, locks can also be upgraded, which means converting a read lock to a write lock.) 2. _Shrinking_: Existing locks can be released. No new locks can be aquired. If the variation allows, locks can be downgraded from write-locks to read locks.) #### Variations of two-phase locking || Basic 2PL || The most liberal version of 2PL; works as described above. Has a growing phase and shrinking phase.|| || Conservative 2PL (aka static 2PL) || Requires a transaction to set all locks before the transaction begins. If it is unable to set all locks, it aborts and attempts at a later point in time. This eliminates deadlocks (see [Problems that may occur in two-phase-locking](#problems-that-may-occur-in-two-phase-locking)), but it might be hard to know what items can be locked beforehand – thus it is hard to implement.|| || Strict 2PL || A transaction does not release any _write locks_ until it has committed or aborted, but it can release read locks. Guarantees strict schedules. || || Rigorous 2PL || A transaction T does not release _any of its locks_ (neither read nor write) until after it commits or aborts. (Most commonly used today.) || ##### Pros & cons with two-phase locks || __Locks__ || __Pros__ || __Cons__ || || Basic || Releases data early || Must set all locks before anyone is released || || Conservative || Deadlock-free || Limited concurrency || || Strict || Implies a strict story || Must set all locks before anyone is released, and commit before releasing writelocks || || Rigorous || Easy to implement || More limited concurrency than strict || ### Problems that may occur in two-phase locking __Deadlock__: A deadlock occurs when each transaction $T$ in a set $S$ of two or more transaction are waiting for some item that is locked by some other transaction $T'$ in the same set; the transactions all hold some item hostage, and are not willing to let go until they get what they need to continue. Ways to deal with it: - _Conservative 2PL_: This requires every transaction to lock all the items it needs in advance. This avoids deadlock. - _Deadlock detection_: This can be done with the system making and maintaing a _wait-for graph_, where each node represents an executing transaction. If a transaction T has to wait, it creates a directed edge to the transaction it's waiting for. There's a deadlock if the graph has a cycle. A challenge is for the system to know when to check for the cycle. This can be solved by checking every time an edge is added, but this may cause a big overhead, so you may want to also use some other criteria, eg. checking when you go above a certain number of executing transactions. - _Timeouts_: This requires that a transaction gets a _time to live_ beforehand. If it has not finished executing within that time, it is killed off. This is simple and has a small overhead, but it may be hard to estimate how long a transaction will take to finish. This may result in transactions being killed off needlessly. _The last two are the most used,_ but there are also two other schemes based on a _transaction timestamp_ TS(T), usually the time the execution started. We have two transactions A and B. B has locked an item, and A wants access: - _Wait-die_: A gets to wait if it's older than B, else it restarts later with the same timestamp - _Wound-wait_: If A is older, then we abort B (A _wounds_ B) and restart it later with the same timestamp. Otherwise A is allowed to wait. Two methods that don't use timestamps have been proposed to eliminate unnecessary aborts: - _No Waiting (NW)_: Transaction that cannot obtain a lock are aborted and restarted after a time delay - _Cautious waiting_: If $T_{B}$ is not waiting for some item, then $T_{A}$ is allowed to wait. Else abort $T_{A}$. __Starvation__: This occurs when a transaction cannot proceed for an indefinite period of time, while other transactions can operate normally. This may happen when the waiting scheme gives priority to some transactions over others. Use a fair waiting scheme, eg. first-come-first-serve or increase priority with waiting time, and this won't be a problem. ## Caching/buffering Disk Blocks Disk pages that are to be updated by a transaction are usually cached into main memory buffers, then updated in memory before they're written back to disk. When the system needs more space it replaces blocks in the cache. To replace blocks is also called flushing blocks. || Dirty bit || Indicates whether or not the buffer has been modified. On flush a block is only written to disk if the dirty bit is 1. (modified). One can also store transaction id(s) of the transactions that changed the block.|| || Pin-unpin bit || A page in cache is pinned (bit is 1) if it cannot be written to disk yet, eg. if the transaction that changed the page isn't commited yet.|| __Flush strategies__ ||In-place updating || Buffer is written back to the original disk location|| ||Shadowing || Writes an updated buffer at a different disk location. Multiple versions of dataitems can be maintained. Shadowing is not usually used in practice.|| - BFIM (Before image): Old value of data - AFIM (after image): New value of data ## Policies used in transaction control The following are four policies that can be used as policy for when a disk from the cache can be written back to disk: || NO-FORCE policy || Changes made to objects are not required to be written to disk in-place. Changes must still be logged to maintain durability, and can be applied to the object at a later time. This reduces seek-time on disk for a commit, as the log is usually sequentially stored in memory, and objects can be scattered. Frequently updated objects can also merge accumulated writes from the log, thereby reducing the total number of writes on the object. Faster, but adds rollback complexity. || || FORCE policy || Changes made to objects are required to be written to disk in-place on commit. Even dirty blocks is written.. Sucks. || || STEAL policy || Allows a transaction to be written on a nonvolatile storage before its commit occurs. Faster, but adds rollback complexity. || || NO-STEAL policy || Does not allow a transaction to be written on a nonvolative storage before its commit occurs. Sucks. || Performance wise, if anyone asks, STEAL NO-FORCE is the way to go. In a STEAL/NO-FORCE scheme both REDO and UNDO might be required during recovery. ## The system log A DBMS maintains a system log tracking all transaction operations to be able to recover from failures. The log must be kept on non-volatile storage, for obvious reasons. A log consists of records of different types: 1. [start\_transaction, transaction\_id] 2. [write\_item, transaction\_id, item\_id, old\_value, new_value] 3. [read\_item, transaction\_id, item_id] 4. [commit, transaction_id] 5. [abort, transaction_id] ## ARIES (aka Algorithms for Recovery and Isolation Exploiting Semantics) Recovery algorithm designed to work with a no-force, steal database approach. ARIES has three main principles: || Write ahead logging || Any change to an object is first recorded in the log, and the log must be written to non-volatile storage before changes to the object are made. || || Repeating history during redo || On restart after a crash, ARIES retraces the steps in history necessary to bring the database to the exact state at the time of the crash, and then undos the transactions that were still active at the time of the crash. || || Logging changes during undo || Changes made to the database during undo are logged to prevent unwanted repeats. || Two datastructures must be maintained to gather enough information for logging: the dirty page table (DPT) and the transaction table (TT): || Dirty page table (DPT) || Keeps record of all the pages that have been modified and not yet written to disk. It also stores the sequence number of the first sequence that made a page dirty (recLSN). || || Transaction table (TT) || Contains all transactions that are currently running, as well as the sequence number of the last sequence that caused an entry in the log. || ARIES periodically saves the DPT and the TT to the log file, creating checkpoints (list of active transactions), to avoid rescanning the entire log file in the case of a redo. This allows for skipping blocks that are not in the DPT, or that are in the DPT but have a recLSN that is greater than the logg post LSN. In case of a crash ARIES has three steps to restore the database state: 1. __Analysis__: Identifies dirty pages in the buffer, and the set of transactions active at the time of the crash. 2. __Redo__: Reapplies updates from the log to the database from the point determined by analysis. 3. __Undo__: The log is scanned backward and the operations of transactions that were active at the time of the crash are undone in reverse order. # SQL ## Retrieving records || SELECT || Selects attributes|| || FROM || Specifies where we want the attributes from. Does not have to be tables, can be other results.|| ||WHERE || Give a condition for the query. Operates on a single row (eg. a single item in a store)|| || AS || Renames a column/attribute for further use in your query|| ||GROUP BY || Lets you specifiy some attribute you want the result grouped by. || || HAVING || Conditions for a group. Operates on aggregates - multiple rows, eg. a group of items in a store || || ORDER BY || How you want to order the data. attributes and ASC/DESC. || Choice of extracted data. Usage: __SELECT__ _OPERATION_ (See [aggregate functions](https://www.wikipendium.no/TDT4145_Data_Modelling_and_Database_Systems#aggregate-functions) for more) || __OPERATION__ || __Explanation__ || || MIN(column\_name) || Selects the smallest value of specified column || || MAX(column\_name) || Selects the largest value of the specified column || || AVG(column\_name) || Selects the average value of the specified column || Note that WHERE gives conditions for a column and HAVING for a group. ### Simple select example with conditions SELECT attributes you want, use commas to seperate FROM where you want this to happen WHERE condition; For instance SELECT s.name, s.address FROM Student AS s, Institute AS i WHERE s.instituteID = i.ID AND i.name = "IDI"; ### Queries with grouped results SELECT attributes you want, use commas to seperate FROM where you want this to happen GROUP BY attribute you want it grouped by HAVING some condition for the group; You may add this after having if you want the groups sorted. ORDER BY some_attribute DESC/ASC; You can also run a regular WHERE-statement and GROUP BY some attribute. Example: SELECT Item.name, Item.price, Item.type FROM Item WHERE price > 200; GROUP BY type HAVING SUM(Item.price) > 2000; ## Modifications of tables or database ### Alter a table Add a an attribute ALTER TABLE table_name ADD new_attribute_name DATATYPE(length) DEFAULT default_value; Remove an attribute ALTER TABLE table_name DROP attribute_name; ### Insert new entry INSERT INTO table_name (column1,column2,column3,...) VALUES (value1,value2,value3,...); ### Delete an entry DELETE FROM where you want this to happen WHERE condition; ### Update an entry UPDATE table you want to update SET new values WHERE condition; For instance UPDATE Pricelist SET price = 20 WHERE price > 20 AND type = "Beer"; ## DDL-operations DDL - Data Definition Language, so operations here is about defining the data and the structure of it. || CREATE TABLE navn (...) || New table based on the data given in the parenthesis || || CREATE VIEW query || Creates a view based on the query|| || CREATE ASSERTION || || || CREATE TRIGGER || || ### Views A view is a statement that define a table, but not actually creating a table. In other words it's a new table composed of data from other tables. Views are often used to simplify queries and for multi user control. To create a view write a statement on the form: CREATE VIEW viewName AS (statement); For instance: CREATE VIEW IDI_Students AS( SELECT s.name FROM Student AS s, Institute AS i WHERE s.instituteID = i.ID AND i.name = "IDI" ); Some views may be updated, some may not. There are two standards: - SQL-92, the one that most implementations use. Very restrictive regarding updating views, and only allows updating views that use projection or selection. - SQL-99 is an extenstion of SQL-92. Will update an attribute in a view if the attribute comes from a particular table and that the primary key is included in the view. ## Functions ### Aggregate Functions ||AVG() || Returns the average value|| ||COUNT() || Returns the number of rows|| ||FIRST() || Returns the first value|| ||LAST() || Returns the last value|| ||MAX() || Returns the largest value|| ||MIN() || Returns the smallest value|| ||SUM() || Returns the sum|| ### Scalar functions ||UCASE() || Converts a field to upper case|| ||LCASE() || Converts a field to lower case|| ||MID() || Extract characters from a text field|| ||LEN() || Returns the length of a text field|| ||ROUND() || Rounds a numeric field to the number of decimals specified|| ||NOW() || Returns the current system date and time|| ||FORMAT() || Formats how a field is to be displayed|| # Algorithms for natural join/equijoin The JOIN-operation is one of the most time-consuming operations, and there are many different ways of doing it. Two-way join is to join two files on two attributes. We call the attributes the files are joined on the join attributes. These methods are for two-way NATURAL JOIN or EQUIJOIN. There's pseudocode for them in the book. ## Nested-loop join

Default brute force algorithm. Does not require any special access. Has an outer and inner loop, and compares on the join attributes. If there's is a match they are joined.

- $O$ is the amount of blocks in outer loop. - $I$ is the amount of blocks in the inner loop. - $b$ is the buffer space. $\text{I/O} = O + \lceil \frac{O}{b - 2} \rceil * I$ $\lceil \frac{O}{b - 2} \rceil$ is the number of iterations in the outer loop. For each outer iteration, the algorithm iterates through all the blocks in the inner loop. In addition, all the blocks in the outer loop have to be read once. ## Single-loop join If an index or hash key exists for one of the two join attributes, say attribute A of file X, then we retrieve each record of the other file with respect to the attribute. In our example let's say that is file Y with respect to attribute A. This is done with one single loop over file Y. Then we use the access structure (such as index or hash key) to retrieve directly all matching records from file X that satisfy x[B] = y[A], where x and y are records frm X and Y respectively. i = amount of index levels for the attribute you're joining on. r = amount of blocks one is looking for. b = amount of blocks over which these are spread. $IO = b + (r ( i+3))$ ## Sort-merge join Given that both records are physically ordered on the join attributes, then we can implement join the most efficient way possible. Both files are scanned concurrently in order of the join attributes. Then we join the records that have a match on the join attributes. If the files are not sorted, they have to be sorted with some _external sorting_. If we have two files of size $b_1$ and $b_2$ blocks respectively, the total number of disk accesses needed to join them is $n = b_1 + b_2$. For each record that must be sorted, add the number of disk accesses needed for external sorting (see [The sort-merge algorithm for external sorting](#the-sort-merge-algorithm-for-external-sorting)). ### The sort-merge algorithm for external sorting __TL;DR__: - $b$: Blocks to be sorted. - $n_{B}$: Available buffer blocks. - $n_{R} = \lceil\frac{b}{n_{B}}\rceil$: Number of initial runs. - $d_{m} = min(n_{B}-1, n_{R})$: Degree of merging. Total number of disk accesses: $(2*b) + (2*b)*\log_{d{M}}(n_{R})$. __The long version__: The sort-merge algorithm is the only _external sorting algorithm_ covered in this course. The word _external_ refers to the fact that the data is too large to be stored in main memory, and thus is stored on some external storage, such as a hard drive. The algorithm is very similar to the classic [merge sort algorithm](https://en.wikipedia.org/wiki/Merge_sort). We will look at the total number of disk accesses needed for the algorithm, as well as giving an explanation of why it is so. _Sort phase_: We start with a file of size $b$ blocks. We have $n_{b}$ available buffers in main memory, each able to store one block from the disk. We read a portion of as many blocks as possible ($= n_{B}$) into the main buffers; each of these portions is called a _run_. Each run is sorted by some internal sorting algorithm, and the results are written to disk after sorting. The number of such _initial runs_ is called $n_{R}$, and it is equal to the total size divided by total buffer space: $n_{R} = \lceil\frac{b}{n_{B}}\rceil$. Because each block has to be read from disk and then written back, the number of disk accesses needed for this phase is $2*b$, which we'll call $n_{SORT-PHASE}$. _Merge phase_: When the algorithm has sorted each of the initial runs and written them to disk, it proceeds to a _merge pass_, which merges as many runs as possible. In this phase, one of the buffers has to be reserved for the result of the merging. The rest of the buffers are used to store the blocks that are to be merged. The computer reads the first blocks from $n_{B}-1$ runs into the buffers – unless we have fewer than $n_{B}-1$ initial runs, and in that case, $n_{R}$ blocks will be read into memory at a time. This number, the number of blocks being merged at a time, will be called the degree of merging, $d_{M}$. When the blocks have been read into memory, the computer proceeds to find the smallest number from each. When this has been found, it is removed from its buffer and put in the result buffer. When the result block is full, it is written to disk; when a buffer is empty, the next block from that sorted run is read into memory. _The number of merge passes_: If we are not able to merge all runs in one merge pass, we might have to do another … and perhaps then another. If you've taken a course in either discrete mathematics or algorithms, you will recognize that this merge phase can be illustrated as a full $d_{M}$-ary tree – each leaf node being an initial run and each internal node the result of a merge. The height of any full $n$-ary tree is decided by the number of leaves (here, the initial runs), which you take the logarithm of, the logarithm having a base equal to the number of children each internal node can have. You'll probably see that this is the number of runs that can be merged in each phase, the degree of merging ($d_{M}$). Thus, the height of the merge tree will be $\log_{dM}(n_{R})$. Because each merge pass has to both read and write a number of blocks equal to the size of the initial file, each merge pass also takes $2*b$ disk accesses. Now, we are almost at the solution: $n_{MERGE-PHASE} = (2*b)*\log_dM(n_{R})$. The total number of disk accesses is thus: $$n_{SORT-MERGE} = n_{SORT-PHASE} + n_{MERGE-PHASE} = (2*b) + (2*b)*\log_{dM}(n_{R})$$ ## Partition-hash join Uses a single hash-function to partion both files, then comparing on the "hash buckets." We'll explain this with an example. Let's say we have two files X and Y and join attributes A and B respectively, and X has fewer records than Y. __Partition phase:__ First, we have one single pass through file X and hash its records with a hash function _h_. Let's say X is small enough to fit all the hashed records into memory. Now the records with the same hash _h(A)_ is in the same hash bucket. __Probing phase:__ We have a single pass through the other file Y and hash each record using the same hash function _h(B)_ to probe that bucket, and that record is joined with all the matching records from X in that bucket. The general case partition hash-join (when the smallest file does not fit entirely in the main memory) is not a part of the curriculum (as of May 2014). ## Primitive operators || Projection (π) || The projection of a relation R on a set of attributes A is the set of all tuples in R restricted to the set of attributes in A. Basically the SELECT of relational algebra. || || Selection (σ) || A selection with the set of conditions C selects all the tuples in a relation R such that the conditions in C hold. Basically the WHERE of relational algebra. || || Rename (ρ) || Renames attributes in a relation. Basically the AS of relational algebra. || || Natural Join (*) || Joins two relations R and S. The result of the join is the set of all combinations of tuples in R and S that are equal on their common attribute names. Basically the INNER JOIN of relational algebra.|| || Theta Join (⋈) || Joins on a specific condition. || # Andre relevante dokumenter: - [Kompendier skrevet av enkeltpersoner](https://wiki.online.ntnu.no/projects/online/wiki/TDT4145>) - [Forelesningsnotater fra kapittel 16 til kapittel 24](http://goo.gl/g0nZk) - ["How to fake database design"](http://blogs.perl.org/users/ovid/2013/07/how-to-fake-database-design.html) - ["Simple Conditions for Guaranteeing Higher Normal Forms in Relational Databases", eller snarvei til femte normalform](http://researcher.watson.ibm.com/researcher/files/us-fagin/tods92.pdf) - ["Method for determining candidate keys and highest normal form of a relation based on functional dependencies"](http://www.rlvision.com/blog/method-for-determining-candidate-keys-and-highest-normal-form-of-a-relation-based-on-functional-dependencies/) # Videoforelesninger Bortsett fra første video er det bare Svein Erik Bratsberg sine forelesninger som er filmet. <https://video.adm.ntnu.no/openVideo/serier/5124992265fdb> 1. Normaliseringsteori (Roger Midtstraum) 2. Intro del 2: læringsmål. Fellesprosjektrelatert. 3. ... 4. ... 5. Lagring og indekser 6. ... 7. ... 8. ... 9. ... 10. ... 11. Transaksjoner 12. Transaksjoner 13. Transaksjoner: recoverability 14. ... 15. Recovery 16. ...