Wikipendium

Share on Twitter Create compendium Add Language
Edit History
Tools
  • Edit
  • History
  • Share on Twitter

  • Add language

  • Create new compendium
Log in
Table of Contents
  1. Introduction: Terms and expressions
  2. Entity Relationship model
    1. ER-notation used in the lectures
  3. Relational algebra
  4. Database normalization
    1. Normal forms (aka NF)
    2. Functional Dependencies
      1. Rules for functional dependencies
    3. Desirable properties of a relational decomposition
      1. Nonadditive join test for a binary decomposition
    4. Keys
  5. Types of failures
  6. Database storage structures
    1. List of storage structures
    2. Index and storage concepts
      1. Hashing schemes
        1. Static hashing schemes
        2. Extendible hashing
        3. Linear hashing
        4. Pros & cons with hashing-schemes
      2. B+ trees
        1. Indexing of a B+ tree
        2. Constructing and inserting into a B+ tree
  7. Transactions
    1. ACID: Atomicity, consistency, isolation and durability
    2. Schedule (or History)
      1. Recoverability of a schedule
      2. Serializability of a schedule
    3. Types of concurrency problems
      1. The isolation levels of SQL
    4. Protocols for concurrency control
      1. Two-phase locking (2PL) protocols
        1. Variations of two-phase locking
          1. Pros & cons with two-phase locks
      2. Problems that may occur in two-phase locking
    5. Caching/buffering Disk Blocks
    6. Policies used in transaction control
    7. The system log
    8. ARIES (aka Algorithms for Recovery and Isolation Exploiting Semantics)
      1. High-level overview
      2. In-depth explanation
        1. 1 - Analysis
        2. 2 - REDO
        3. 3 - UNDO
  8. SQL
    1. Retrieving records
      1. Simple select example with conditions
      2. Queries with grouped results
    2. Modifications of tables or database
      1. Alter a table
      2. Insert new entry
      3. Delete an entry
      4. Update an entry
    3. DDL-operations
      1. Views
    4. Functions
      1. Aggregate Functions
      2. Scalar functions
  9. Algorithms for natural join/equijoin
    1. Nested-loop join
    2. Single-loop join
    3. Sort-merge join
      1. The sort-merge algorithm for external sorting
    4. Partition-hash join
    5. Primitive operators
  10. Andre relevante dokumenter:
  11. Videoforelesninger
‹

TDT4145: Data Modelling and Database Systems

Tags:
  • database
  • sql
  • datamod
+

Introduction: Terms and expressions

Database A database is a collection of data.
Data Facts that can be recorded and that have an implicit meaning
Miniworld or Universe of Discourse (UoD) A database represents a part of the real world, and we call this part a miniworld or UoD. Changes in the real world are reflected in the database.
Database model A type of data model that determines the logical structure of a database. A database model can be represented in many different ways, eg. through an ER-model.
Database management System (DBMS) A collection of programs that allows users to create and maintain a database.
Database system Refers collectively to the database, database model, and the DBMS.

Entity Relationship model

An Entity Relationship Model is a data model for describing the structure and organization of data ultimately to be implemented in a database. An ER-model is usually described through an ER-diagram. The ER-model is for the most part composed of entities that represent objects or things, and relationship classes that represent the relationship between two entities. The entities are identified with a name and attributes.

ER-notation used in the lectures

As of May 2014. Described in Norwegian.

ER-notation part 1 ER-notation part 2

Relational algebra

Used to describe how data can be organized into sets. It's what modern query languages is built on. A statement in relational algebra can either be given as text, or as a tree. Any projection ($\pi$) will result in a distinct set. Thus there is no operator for "distinct" queries in relational algebra.

Database normalization

Database normalization is the process of organizing the fields and tables of a relational database to minimalize redundancy and dependency.

Normal forms (aka NF)

A normal form is a set of criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies. The most common normal forms are:

1NF A table represents a relation and has no repeating groups. All attribute values are required to be atomic.
2NF No non-prime attribute in the table is partially dependent on any candidate key.
3NF Every non-prime attribute in the table is directly dependent on every candidate key.
BCNF Every non-trivial functional dependency is a dependency on a superkey.
4NF For every non-trivial multi-valued dependency (MVD) X->>Y, X is a superkey.

Note that to meet the requirements of a normal form, a table must also meet all the requirements of the lower normal forms.

Functional Dependencies

A functional dependency, written $ A \rightarrow B $, on some relation (table) where A and B are sets of attributes (columns) in the relation, means that if two tuples have the same value in A they must have the same value in B.

$$ A \rightarrow B \Longleftrightarrow t_{n}[A]=t_{m}[A] \Rightarrow t_{n}[B]=t_{m}[B] $$

If more than one attribute appears on either side of the arrow they should be and'ed together

Rules for functional dependencies

Reflexivity If Y is a subset of X, then $ X \Rightarrow Y $
Augmentation If $ X \Rightarrow Y$, then $XZ \Rightarrow YZ $
Transitivity If $ X \Rightarrow Y $ and $ Y \Rightarrow Z$ then $ X \Rightarrow Z $

You'll get far with these, but they can also be derived to:

Union If $ X \Rightarrow Y $ and $ X \Rightarrow Z $, then $ X \Rightarrow YZ $
Decomposition If $ X \Rightarrow YZ $, then $X \Rightarrow Y$ and $X \Rightarrow Z$
Pseudotransitivity If $ X \Rightarrow Y$ and $WY \Rightarrow Z$, then $WX \Rightarrow Z$
Composition If $ X \Rightarrow Y $ and $ Z \Rightarrow W $, then $ XZ \Rightarrow YW $

So what can these rules be used for?

Closure of a set of functional dependencies

We have a set of functional dependencies F. The closure of F is a set of functional dependencies F' that are logically implied by F. In more human terms, it is the set of all possible functional dependencies that can be derived from the functional dependencies of F using the rules specified above.

As well as finding the closure of a set of functional dependencies, you might want to work out the closure of just one or more attributes, using the set F. (This is the set of all things you can know, knowing only the values of this or these attributes). The algorithm can be described with the following steps:

  1. Start with a set $S$, the attributes that you want to find the closure of.
  2. For the set $F$ of functional dependencies, look at each functional dependency and see if you have all the left-hand side attributes in $S$. If so, add all the right-hand side attributes of this particular functional dependency to a set of new attributes, $N$.
  3. Now, $S = S \cup N$
  4. If $N$ was empty, you have found all possible attributes that can be known from the initial $S$. If $N$ was not empty, repeat from step 2 – there might still be territory to explore.

Equivalent functional dependencies

Two sets of functional dependencies A and B are said to be equivalent if the closures of their sets of functional dependencies are the same. However, you'll not want to do this by hand on your exam (unless you possess the ability to travel back through time).

The quick and easy way is to see if you can apply the rules above to derive A from B as well as B from A. If you're able to derive every functional dependency in each set from the other, they are functionally equivalent.

Desirable properties of a relational decomposition

There are four properties that a decomposition of a relation $ R $ into several relations $ R_{1}, R_{2}, … , R_{n} $ should possess or strive to attain:

Property Explanation
Preservation of attributes All attributes of the original table should be present in at least one of the rows in the new tables so that no attribute are lost
Preservation of functional dependencies All functional dependencies of the original table should be present. (This is not always possible when decomposing into higher normal forms.)
Lossless join Also called the non additive property. When the tables are joined, no additional tuples should be generated
Highest possible normal form The individual relations of decomposition should attain the highest normal form possible

The properties are described in detail in chapter 15.2 in Elmasri/Navathe.

Nonadditive join test for a binary decomposition

A binary decomposition is a decomposition of one relation into two (and only two) relations. As mentioned above, a nonadditive (or lossless) join is a join in which no additional tuples are generated. (In other words, this is really a test for whether it is possible to generate the original table from the two new)

$F^{+}$ is the closure of the functional dependencies of the original relation $R$, which has been decomposed into the two relations $R_{1}$ and $R_{2}$ – a binary decomposition. The binary decomposition has the lossless join property if either:

  • The functional dependency $((R_{1} \cap R_{2}) \rightarrow (R_{1} - R_{2}))$ is in $F^{+}$, or
  • The functional dependency $((R_{1} \cap R_{2}) \rightarrow (R_{2} - R_{1}))$ is in $F^{+}$.

Keys

Superkey A set of attributes of a relation schema that are unique for each tuple. I.e. if S is a superkey of the relation schema R, then no two tuples in R will have the same combination of values in S: $ t_{n}, t_{m}\in r(R) $, $S\in R$, $ n\neq m $, $ t_{n}[S]\neq t_{m}[S] $ Here we have that $r(R)$ is a particular state of the table.
Key A minimal superkey: removing one of the key's attributes results in it ceasing to be a superkey.
Candidate key If a relation has more than one key, they are called candidate keys: candidate keys are a set of attributes that qualify as a key.
Primary key Whichever candidate key is arbitrarily chosen as a relation's key.
Prime attribute Prime attributes are attributes that are part of one or more candidate keys.

Types of failures

A database system may fail, and there are different types of failures. Failures of type 1-4 are common, and the system is expected to automatically recover from these. Failures of type 5 and 6 are less common, and recovering from them can be a major undertaking.

  1. A computer failure / crash
  2. A transaction or system error
  3. Local errors or exception conditions detected by the transaction
  4. Concurrency control enforcement
  5. Disk failure
  6. Physical problems and catastrophes

Database storage structures

Databases are typically stored on secondary (non-volatile) storage.

List of storage structures

Heap file The simplest storage structure: an unordered list of records. Records are inserted at the end of the heap file, giving O(1) insertion. Retrieval requires a linear search ( O(n) ), and is therefore quite inefficient for large values of n. Deleted records are only marked as deleted, and not actually removed. This is O(1) (in addition to the time it takes to retrieve a record to be deleted). Highly volatile heap-files must be re-structured often to prune away records that are marked as deleted.
Hash buckets Hash functions directly calculate the page address of a record based on values in the record. A good hashing function has an even spread across the address space. When doing calculations on hash buckets (say, for an exam), it is safe to assume about a 60% fill rate. Ignoring collisions, hash buckets are O(1) for retrieval, insertion, modification and deletion. Hash buckets often suck for range retrieval.
B+-trees A pretty good all-around, tree, storage structure. It's always balanced, and the indexed keys are in sorted order. It is usually safe to assume they are of height 3 or 4. Retrieval, insertion, modification and deletion is O(log n). Good for volatile data, as the index dynamically expands and contracts as the data set grows and shrinks. Less optimal for non-volatile/stable files - use ISAM for that.
ISAM (Indexed Sequential Access Method)

Index and storage concepts

First some vocabulary:

Index field A field or attribute of the record/row that is indexed
Primary index An index structure that indexes the primary key of the records.
Secondary index An index which is not a primary index.
Clustered index Index on a table. The record is stored physically in the same order as the index records. In practice, we either use a B+-tree or hash index.

The point of an index is to make searching faster. The index points to where the actual data for the key is located. So now that we have the structures and some vocabulary, let's look at some ways to implement storage:

  • Clustered B+-tree/clustered index: We use a B+-tree index on the primary key. Storage of data is done on the leaf-level in the tree (thus clustered index).
  • Heap file and B+-tree: The table is stored in a heap file, and we use a B+-tree index on the primary key.
  • Clustered hash index: Hash index on the primary key. Clustered index storing the actual records of the table.

Hashing schemes

Static hashing schemes

These have a fixed number of buckets (static).

Internal hashing is for hashing internal files. It's implemented as a hash table through the use of an array of records, eg. an array from index $0$ to $M-1$. It then has M slots. We apply a hash function $h(k)$ to the keys. The hash function should give a number between 0 and M. $h(k)= k mod m$ is often used.

Two keys hashing to the same bucket is called a collision. This can be solved by:

  • Open addressing: if there's a collision, you should go through the index 'till you find an empty bucket.
  • Chaining: We extend the array with a number of overflow locations, and add a pointer to each record location. If there's a collision we place it in an unused overflow position, and set the pointer of the occupied hash address to the overflow location.
  • Multiple hashing: if there's a collision we use another hash function and hash to another bucket. If the second hash function also leads to collision then open addressing is used.

External hashing for disk files: Target space is made of buckets, each of which holds multiple records. We hash to one bucket. For handling collisions, we use a version of chaining where each bucket holds a pointer to a linked list used for overflow records for that bucket.

Extendible hashing

Extendible hashing treats a hash as a bit string, and uses a trie for bucket lookup. Re-hashing is done incrementally, one bucket at a time, as needed. Both LSB and MSB implementations are common.

Linear hashing

You start with $M-1$ buckets and a hashing function $h_{i}(k) = k mod M$. On collision the buckets are split, so you have $M$ buckets. A new hashing function $h_{i+1}(k) = k mod 2M$ is deployed. You also increment a value $n$ that counts the number of splits you've done. Initally $n=0$.

On retrieval you first try $h_{i}(k)$, and if $h_{i}(k)<n$ the bucket's already been split, and you apply $h_{i+1}(k)$ instead. Over time you'll have $2M$ buckets, and all buckets will use $h_{i+1}$.

Linear hashing allows for expansion of the hash table one slot at a time, round-robin.

Pros & cons with hashing-schemes

Hashing Pros Cons
Static If you know the number of posts Chaining overflow
Extendible Dynamic data-input, splits one block at a time Doubles the catalog on each split
Linear Clustered (no catalog), Dynamic Uncontrolled splitting

B+ trees

A B+-tree is a storage structure that is constructed as an $n$-ary tree. We expect that you already have some basic knowledge about trees. In the tree, each node (including leaves) corresponds to one disk block.

Indexing of a B+ tree

The way the indexing of the B+-trees works, here demonstrated with an unclustered B+-tree:

  • All the values of the index field, e.g. the numbers from 0 to 99, are stored sorted at the leaf level. Let's imagine that there is room for 10 index values in each of these leaves, and that each leaf node is completely full. Each index value in the leaf level points to the place where the data of the post with this index value is stored. In addition to this, at the end of each leaf node, there is a pointer to the next node, which makes it possible to iterate through the leaf level.
    • In a clustered B+ tree, there are no data pointers at the leaf level, but instead the record that the value points to. If the records are larger than data pointers, this leads to leaf nodes having a lower capacity than internal nodes.
  • The nodes at the levels above the leaf nodes serve as an index of the level below. The way this works, is that each value $V_i$ ($0 < i \le n$) is stored with data pointers to its left and right, $P_{i-1}$ and $P_i$ respectively. The left pointer points to a node where all values are less than the value of $V$, and the right pointer points to a node with values greater than or equal to it. In our example, these are leaf nodes, but they might be another internal node. An internal node would look like this: $<P_0, V_1, P_1, \ldots , P_{n-1}, V_n, P_n>$.
    • In our example, the internal node would look like this: $<P_0, 10, P_1, 20, P_2, \ldots, P_8, 90, P_9>$.

Usually, when looking for some value, we start at the root level. Let's say that the index field is equal to the employee number of some employees, and we want to look for "Douglas", who has employee number 42. We start at the root node (which in our example is the only internal node), going through the values. We continue until we find the numbers 40 and 50 with a data pointer in between them. Because $40 \le 42 < 50$, we follow the data pointer and come to a leaf node. We look through the root node until we find 42 and its data pointer, which points to employee 42's record.

It's important to remember that in a B+-tree a block should be at least 50% filled. On average, a block will be 67% filled. Some operate on 69% fill degree.

Constructing and inserting into a B+ tree

Before reading this, it is important that you understand the indexing of a B+ tree.

We will now have to elaborate on the differences between the different kinds of nodes:

  • The root node is at the top of the tree and is the node at which we start our searches – both for lookup and insertion. A root node is no different from an internal node in its contents, but it is has a certain meaning when inserting.
  • The internal nodes are the nodes that serve as indices for the leaf nodes.
  • The leaf nodes are at the bottom level of the tree. They store all occurrences of the index values with the actual data (or a pointer to them in the unclustered case) connected to them. All leaf nodes have a pointer to the next leaf node.

For this explanation, we will be concerned with the placement of the indices. We take it for granted that you understand where data pointers in internal and leaf nodes are located.

The algorithm for insertion of a new value $V$ is as follows:

  1. Perform a search to determine which leaf node should contain $V$.
  2. If the leaf node is not full, insert $V$. You will not have to follow any more steps.
  3. If the leaf node is full, it must be split:
    1. Split the node at the middle, creating one new node for the elements right of the middle. If there is an odd number of records in the node, you have to decide on whether your algorithm will put the middle element in the left or right node. Both options are fine, but because algorithms aren't random, you have to stick with it.
    2. Insert the new leaf's smallest key as well as a pointer to it in the node above (the parent).
  4. If there is no room for the new key in the internal node, we have to do an in internal split.
    1. An internal split is equal to the leaf node split, except: When we split an internal node in two, we find the middle element. (Now, if the number of elements is even, make a decision and stick with it.) Move the middle element up to the level above. This means that the element will only occur at the level above, not the level below (as opposed to leaf node splitting).
    2. Continue this process of internal splitting until you reach a parent node that needs not be split.
    3. You might not reach a parent node that doesn't have to be split: The root node can be full, and you have to do a root node split. This isn't as hard as it sounds; it only means that the tree will grow in height. It's as simple as doing an internal split of the root node, moving the middle element up to a newly constructed node, which becomes the new root node.

Constructing the tree is no harder than simply starting with one leaf node (which is also, technically, the root node), and following this algorithm rigorously. Create the first internal node when you have to split the root node. When drawing this, it is obligatory that you draw arrows for the data pointers in the tree.

Transactions

A transaction is a unit of work performed within a DBMS. They provide reliability and isolation. Transactions are "all or nothing", in the sense that the entire unit of work must complete without error, or have no effect at all.

It is important to note that a transaction is the low-level execution of some database query, described purely by reading and writing values (with more advanced operations that we'll come to discuss). This sequence of operations might represent an operation such as "Update employee John's salary" or "Find the artist named Justin Bieber and remove him from my library". However, because it is low-level (opposed to the high-level SQL-query that will represent this), you won't see the actual intention when looking at the transaction; you'll only see the read, write and other low-level operations involved.

ACID: Atomicity, consistency, isolation and durability

Transactions must conform to the ACID principles:

Atomicity A unit of work cannot be split into smaller parts—a unit of work can either complete entirely or fail entirely.
Consistency A transaction always leaves the database in a valid state upon completion, with all data valid according to defined rules such as constraints, triggers, cascades etc.
Isolation No transaction can interfere with another transaction.
Durability When a transaction completes, it remains complete, even in the case of power loss, crashes etc.

A typical transaction follows the following lifecycle:

  1. Begin the transaction
  2. Execute a set of data manipulations/queries
  3. If no errors occur, then commit the transaction and end it
  4. If errors occur, then rollback the transaction and end it

Schedule (or History)

Schedules are lists of operations (usually data reads/writes, locks, commits etc) ordered by time, eg. $$ S_{a}: r_1(X); w_1(X); C_1; $$

S is the name of the schedule. An $r_{n}$ is a read operation done by transaction n, and $w_{n}$ is a write operation done by transaction n, which in this case. A $c$ means commit (the transaction can be written to disk), $a$ means abort (the changes made by the transaction can be rolled back).

Schedules can be classified into different classes, based on either serializability or recoverability. It is important to note that these classifications are completely independent – that is, knowing only the recoverability of a schedule, tells us nothing about the serializability and vice versa.

Recoverability of a schedule

All schedules can be grouped into either nonrecoverable or recoverable schedules. A schedule is recoverable if it is never necessary to roll back a transaction that has been committed. In the group of recoverable schedules, we have a subgroup: Schedules that Avoid Cascading Aborts (ACA), and this again has a subgroup: Strict schedules.

$$ Recoverable \supset ACA \supset Strict $$

Here are the different levels of recoverability:

Recoverable A schedule is recoverable if every transaction commits after the transactions it has read from. "Read from" refers to reading a value that another transaction has written to. E.g $S: w_1(A), r_2(A), c_1, c_2$. If a schedule fails to meet this criterium, it is nonrecoverable.
Avoiding Cascading Abort (ACA) When one transaction has changed one or more values and must be aborted, transactions that have read from this value must also be aborted. To avoid this, transactions can only read values from committed transactions. However, it is completely okay to write to an item that has been written to by another uncommitted transaction. For example: $S: w_1(X), r_2(Y), w_1(X), c_1, r_2(X), c_2$
Strict A strict schedule doesn't allow any reads from or writes to values that have been changed by an uncommitted transaction; such a schedule simplifies the recovery process. For example: $S: w_1(X), r_2(Y), c_1, w_2(X), c_2$

Serializability of a schedule

Ideally, every transaction should be executed from start to end (commit or abort) without any other transactions active (a serial schedule). However, this is a luxury we cannot afford when working with databases, mainly because a transaction spends a lot of time waiting for data being read from the disk. Therefore, transactions are executed in an interleaved fashion: When there is available time, another transaction jumps in and does some of the work it wants to do.

It is apparent that interleaving operations introduce some new problems. For instance: $A$ is your bank account. Transaction $T_1$ reads the amount in $A$ and adds 500 dollars while $T_2$ reads the amount in the account and adds interest (i.e. multiplies the value by some number). As you can imagine, reordering the operations might affect the end result (if $T_2$ reads while $T_1$ prepares to add the 500 dollars, the result of $T_1$ will be forgotten when $T_2$ writes its values). If we can interleave the operations so that the end result is guaranteed to be the same as if they execute in a serial way, the schedule is called serializable.

In short:

Serial Transactions are non-interleaved. No transaction starts before the previous one has ended. Example: D = R1(X);W1(X);C1;W2(X);C2.
Serializable Schedules that have equivalent outcomes to a serial schedule are serializable. A schedule is a conflict serializable if and only if its precedence graph is acyclic (see below).

This course only looks at conflict serializability. If you are interested in what this actually means, consult the book; you only have to be able to test for conflict serializability to pass this course. The best test for conflict serializability is constructing a precedence graph.

Before looking at the precedence graph, it is necessary to define the concept of a conflict between transactions. Two operations are in conflict if all of these criteria are met:

  1. They belong to different transactions.
  2. They access the same item X.
  3. At least one of them is a write operation.

For a schedule $S$, the precedence graph is constructed with the following steps:

  1. For each transaction $T_{i}$ in $S$, draw a node and label it with its transaction number.
  2. Go through the schedule and find all conflicts (see criteria above). For each conflict, draw an edge in the graph, going from the transaction that accessed the item first to the transaction that accessed the item last in the conflict.

If there are no cycles in the graph, the schedule is conflict serializable.

Types of concurrency problems

Multiple transactions executing with their operations interleaved might introduce problems. Here are the four different kinds; the first three definitions are taken from the Wikipedia article on concurrency control (as of May 24th, 2014).

Dirty read Transactions read a value written by a transaction that has been later aborted. This value disappears from the database upon abort, and should not have been read by any transaction ("dirty read"). The reading transactions end with incorrect results.
Lost update A second transaction writes a second value of a data-item on top of a first value written by a first concurrent transaction, and the first value is lost to other transactions running concurrently which need, by their precedence, to read the first value. The transactions that have read the wrong value end with incorrect results. (Wikipedia, Concurrency control) For example: $w_1$(A), $w_2$(A), $w_2$(B), $c_2$, $w_1$(B), $c_1$
Incorrect summary While one transaction takes a summary of the values of all the instances of a repeated data-item, a second transaction updates some instances of that data-item. The resulting summary does not reflect a correct result for any (usually needed for correctness) precedence order between the two transactions (if one is executed before the other), but rather some random result, depending on the timing of the updates, and whether certain update results have been included in the summary or not.
Unrepeatable read Transaction $T_1$ reads value A. Another transaction $T_2$ writes to A before $T_1$ reads from it another time, but the value of A has (inexplicably, to $T_1$) changed – the read is thus unrepeatable.

Another problem that is common, but not usually grouped together with these is phantoms. A phantom is a value in a group that is changed while some other transaction does work on this group. For example: $T_1$ wants to find the average salary of all employees working in the department with the name "Funny hats", and it reads all of the values twice (for its own mysterious reasons). $T_2$ wants to find the employee named "Oliver Klozoff", who works in the "Funny hats" department, and lower his salary by one dollar. If $T_1$ reads all values, including Oliver's salary, and $T_2$ then jumps in and changes the salary, his salary is now a phantom value. When $T_1$ returns and reads all of the values again – discovering that one of the values has changed. A phantom read has occurred.

The isolation levels of SQL

SQL has different isolation levels that are used to combat these problems, each employing some of the techniques described in other parts of this article. What you should be concerned with are what problems might appear when using the different isolation levels. In MySQL, the Serializable level is default.

Isolation level Dirty read Unrepeatable read Phantoms
Read uncommitted Yes Yes Yes
Read committed Yes Yes
Repeatable read Yes
Serializable

Protocols for concurrency control

Concurrency control protocols are sets of rules that guarantee serializability, which means that it guarantees that the transactions produce the same result as if they were executed serially without overlapping in time. 2PL (see below) is the only protocol that's a part of the curriculum.

Two-phase locking (2PL) protocols

A lock is a variable associated with a data item. A lock is basically a transaction's way of saying to other transactions "I'm reading this data item; don't touch" or "I'm writing to this data; don't even look at it".

Binary locks A binary lock on an item can have two states: Locked or unlocked (1 or 0). If the lock on an item is 1, then that data item cannot be accessed by another transaction. When a transaction starts to operate on an item, it locks it. When it's done, it unlocks the item. If a transaction wants to do operations on a locked item, it waits until it is unlocked (or aborts or recommences at some other time). Binary locks are not usually used because they limit concurrency.
Shared/executive (or Read/Write) locks As said, binary locking is too restrictive. This leads ut to the multi-mode locks. Here there are three locking operations that can be done on a data item X: read_lock(X), write_lock(X) and unlock(X). This means there are three possible states for the data item: read-locked (also called share-locked), write-locked (also called exclusive-locked) and unlocked.

By making sure that the locking operations occur in two phases, we can guarantee serializable schedules, and thus we get two-phase locking protocols. The two phases are 1. Expanding (growing): New locks on items can be acquired but none can be released. (If the variation allows, locks can also be upgraded, which means converting a read lock to a write lock.) 2. Shrinking: Existing locks can be released. No new locks can be acquired. If the variation allows, locks can be downgraded from write-locks to read locks.)

Variations of two-phase locking

Basic 2PL The most liberal version of 2PL; works as described above. Has a growing phase and shrinking phase.
Conservative 2PL (aka static 2PL) Requires a transaction to set all locks before the transaction begins. If it is unable to set all locks, it aborts and attempts at a later point in time. This eliminates deadlocks (see Problems that may occur in two-phase-locking), but it might be hard to know what items can be locked beforehand – thus it is hard to implement.
Strict 2PL A transaction does not release any write locks until it has committed or aborted, but it can release read locks. Guarantees strict schedules.
Rigorous 2PL A transaction T does not release any of its locks (neither read nor write) until after it commits or aborts. (Most commonly used today.)
Pros & cons with two-phase locks
Locks Pros Cons
Basic Releases data early Must set all locks before anyone is released
Conservative Deadlock-free Limited concurrency
Strict Implies a strict story Must set all locks before anyone is released, and commit before releasing writelocks
Rigorous Easy to implement More limited concurrency than strict

Problems that may occur in two-phase locking

Deadlock: A deadlock occurs when each transaction $T$ in a set $S$ of two or more transaction are waiting for some item that is locked by some other transaction $T'$ in the same set; the transactions all hold some item hostage, and are not willing to let go until they get what they need to continue.

Ways to deal with it:

  • Conservative 2PL: This requires every transaction to lock all the items it needs in advance. This avoids deadlock.
  • Deadlock detection: This can be done with the system making and maintaining a wait-for graph, where each node represents an executing transaction. If a transaction T has to wait, it creates a directed edge to the transaction it's waiting for. There's a deadlock if the graph has a cycle. A challenge is for the system to know when to check for the cycle. This can be solved by checking every time an edge is added, but this may cause a big overhead, so you may want to also use some other criteria, eg. checking when you go above a certain number of executing transactions.
  • Timeouts: This requires that a transaction gets a time to live beforehand. If it has not finished executing within that time, it is killed off. This is simple and has a small overhead, but it may be hard to estimate how long a transaction will take to finish. This may result in transactions being killed off needlessly.

The last two are the most used, but there are also two other schemes based on a transaction timestamp TS(T), usually the time the execution started. We have two transactions A and B. B has locked an item, and A wants access:

  • Wait-die: A gets to wait if it's older than B, else it restarts later with the same timestamp
  • Wound-wait: If A is older, then we abort B (A wounds B) and restart it later with the same timestamp. Otherwise, A is allowed to wait.

Two methods that don't use timestamps have been proposed to eliminate unnecessary aborts:

  • No Waiting (NW): Transaction that cannot obtain a lock are aborted and restarted after a time delay
  • Cautious waiting: If $T_{B}$ is not waiting for some item, then $T_{A}$ is allowed to wait. Else abort $T_{A}$.

Starvation: This occurs when a transaction cannot proceed for an indefinite period of time, while other transactions can operate normally. This may happen when the waiting scheme gives priority to some transactions over others. Use a fair waiting scheme, eg. first-come-first-serve or increase priority with waiting time, and this won't be a problem.

Caching/buffering Disk Blocks

Disk pages that are to be updated by a transaction are usually cached into main memory buffers, then updated in memory before they're written back to disk. When the system needs more space it replaces blocks in the cache. To replace blocks is also called flushing blocks.

Dirty bit Indicates whether or not the buffer has been modified. On flush, a block is only written to disk if the dirty bit is 1. (modified). One can also store transaction id(s) of the transactions that changed the block.
Pin-unpin bit A page in cache is pinned (bit is 1) if it cannot be written to disk yet, eg. if the transaction that changed the page isn't committed yet.

Flush strategies

In-place updating Buffer is written back to the original disk location
Shadowing Writes an updated buffer at a different disk location. Multiple versions of data items can be maintained. Shadowing is not usually used in practice.
  • BFIM (Before image): Old value of data
  • AFIM (after image): New value of data

Policies used in transaction control

The following are four policies that can be used when writing a cached object back to disk:

NO-FORCE policy Changes made to objects are not required to be written to disk in-place. Changes must still be logged to maintain durability, and can be applied to the object at a later time. This reduces seek-time on disk for a commit, as the log is usually sequentially stored in memory, and objects can be scattered. Frequently updated objects can also merge accumulated writes from the log, thereby reducing the total number of writes on the object. Faster, but adds rollback complexity.
FORCE policy Changes made to objects are required to be written to disk in-place on commit. Even dirty blocks are written... Sucks.
STEAL policy Allows a transaction to be written to a nonvolatile storage before its commit occurs. Faster, but adds rollback complexity.
NO-STEAL policy Does not allow a transaction to be written to a nonvolatile storage before its commit occurs. Sucks.

Performance wise, if anyone asks, STEAL NO-FORCE is the way to go. In a STEAL/NO-FORCE scheme both REDO and UNDO might be required during recovery.

The system log

A DBMS maintains a system log tracking all transaction operations to be able to recover from failures. The log must be kept on non-volatile storage, for obvious reasons. A log consists of records of different types:

  1. [start_transaction, transaction_id]
  2. [write_item, transaction_id, item_id, old_value, new_value]
  3. [read_item, transaction_id, item_id]
  4. [commit, transaction_id]
  5. [abort, transaction_id]

ARIES (aka Algorithms for Recovery and Isolation Exploiting Semantics)

High-level overview

Recovery algorithm designed to work with a no-force, steal database approach. ARIES has three main principles:

Write ahead logging Any change to an object is first recorded in the log, and the log must be written to non-volatile storage before changes to the object are made.
Repeating history during redo On restart after a crash, ARIES retraces the steps in history necessary to bring the database to the exact state at the time of the crash, and then undoes the transactions that were still active at the time of the crash.
Logging changes during undo Changes made to the database during undo are logged to prevent unwanted repeats.

Two datastructures must be maintained to gather enough information for logging: the dirty page table (DPT) and the transaction table (TT):

Dirty page table (DPT) Keeps record of all the pages that have been modified and not yet written to disk. It also stores the sequence number of the first sequence that made a page dirty (recLSN).
Transaction table (TT) Contains all transactions that are currently running, as well as the sequence number of the last sequence that caused an entry in the log.

ARIES periodically saves the DPT and the TT to the log file, creating checkpoints (list of active transactions), to avoid rescanning the entire log file in the case of a redo. This allows for skipping blocks that are not in the DPT, or that are in the DPT but have a recLSN that is greater than the logg post LSN.

In case of a crash ARIES has three steps to restore the database state:

  1. Analysis: Identifies dirty pages in the buffer, and the set of transactions active at the time of the crash.
  2. Redo: Reapplies updates from the log to the database from the point determined by analysis.
  3. Undo: The log is scanned backward and the operations of transactions that were active at the time of the crash are undone in reverse order.

In-depth explanation

1 - Analysis

  • The analysis phase starts at the begin_checkpoint log post, or at the first log post if begin_checkpoint is not present. If it starts at the beginning, the Transaction Table (TT) and Dirty Page Table (DPT) are set to empty tables, if not they are populated with values from the start of the log and up to the begin_checkpoint point.
  • Every log post from the starting point is analyzed; if the page is updated, but currently not in DPT, add it along with the LSN (recLSN). When a transaction firsts performs an operation, add it to TT. Modify the corresponding LSN (lastLSN) whenever the transaction performs an operation.

2 - REDO

  • Find the lowest LSN in DPT and begin at the corresponding log post in the log.
  • You should perform a REDO unless one of the following conditions are satisfied:
    1. Page is not in DPT. This means that the page is not dirty.
    2. Page is in DPT, but recLSN > LSN. This means the change will come later in the log.
    3. Page is in DPT, recLSN = LSN and pageLSN $\geq$ LSN (LSN of page on disk). This means that the current or a future operation has written the change to disk already.

3 - UNDO

  • The transactions in TT that have not commited are called loser transactions.
  • Pick the highest LSN of the loser transactions and undo it.
  • When there are no more transactions that are uncommited, we are done.

SQL

Retrieving records

SELECT Selects attributes
FROM Specifies where we want the attributes from. Does not have to be tables, can be other results.
WHERE Give a condition for the query. Operates on a single row (eg. a single item in a store)
AS Renames a column/attribute for further use in your query
GROUP BY Lets you specify some attribute you want the result grouped by.
HAVING Conditions for a group. Operates on aggregates - multiple rows, eg. a group of items in a store
ORDER BY How you want to order the data. attributes and ASC/DESC.

Choice of extracted data. Usage: SELECT OPERATION (See aggregate functions for more)

OPERATION Explanation
MIN(column_name) Selects the smallest value of specified column
MAX(column_name) Selects the largest value of the specified column
AVG(column_name) Selects the average value of the specified column

Note that WHERE gives conditions for a column and HAVING for a group.

Simple select example with conditions

SELECT attributes you want, use commas to separate
FROM where you want this to happen
WHERE condition;

For instance

SELECT s.name, s.address
FROM Student AS s, Institute AS i
WHERE s.instituteID = i.ID AND i.name = "IDI";

Queries with grouped results

SELECT attributes you want, use commas to separate
FROM where you want this to happen
GROUP BY attribute you want it grouped by
HAVING some condition for the group;

You may add this after having if you want the groups sorted.

ORDER BY some_attribute DESC/ASC;

You can also run a regular WHERE-statement and GROUP BY some attribute. Example:

SELECT Item.name, Item.price, Item.type
FROM Item
WHERE price > 200; 
GROUP BY type
HAVING SUM(Item.price) > 2000;

Modifications of tables or database

Alter a table

Add an attribute

ALTER TABLE table_name
ADD new_attribute_name DATATYPE(length) DEFAULT default_value;

Remove an attribute

ALTER TABLE table_name DROP attribute_name;

Insert new entry

INSERT INTO table_name (column1,column2,column3,...)
VALUES (value1,value2,value3,...);

Delete an entry

DELETE FROM where you want this to happen
WHERE condition;

Update an entry

UPDATE table you want to update
SET new values
WHERE condition;

For instance

UPDATE Pricelist
SET price = 20
WHERE price > 20 AND type = "Beer";

DDL-operations

DDL - Data Definition Language, so operations here is about defining the data and the structure of it.

CREATE TABLE navn (...) New table based on the data given in the parenthesis
CREATE VIEW query Creates a view based on the query
CREATE ASSERTION
CREATE TRIGGER

Views

A view is a statement that defines a table, but not actually creating a table. In other words, it's a new table composed of data from other tables. Views are often used to simplify queries and for multi-user control. To create a view write a statement on the form:

CREATE VIEW viewName AS (statement);

For instance:

CREATE VIEW IDI_Students AS(
    SELECT s.name
    FROM Student AS s, Institute AS i
    WHERE s.instituteID = i.ID AND i.name = "IDI"
);

Some views may be updated, some may not. There are two standards:

  • SQL-92, the one that most implementations use. Very restrictive regarding updating views, and only allows updating views that use projection or selection.
  • SQL-99 is an extension of SQL-92. Will update an attribute in a view if the attribute comes from a particular table and that the primary key is included in the view.

Functions

Aggregate Functions

AVG() Returns the average value
COUNT() Returns the number of rows
FIRST() Returns the first value
LAST() Returns the last value
MAX() Returns the largest value
MIN() Returns the smallest value
SUM() Returns the sum

Scalar functions

UCASE() Converts a field to upper case
LCASE() Converts a field to lower case
MID() Extract characters from a text field
LEN() Returns the length of a text field
ROUND() Rounds a numeric field to the number of decimals specified
NOW() Returns the current system date and time
FORMAT() Formats how a field is to be displayed

Algorithms for natural join/equijoin

The JOIN-operation is one of the most time-consuming operations, and there are many different ways of doing it. Two-way join is to join two files on two attributes. We call the attributes the files are joined on the join attributes. These methods are for two-way NATURAL JOIN or EQUIJOIN. There's pseudocode for them in the book.

Nested-loop join

Default brute force algorithm. Does not require any special access. Has an outer and inner loop, and compares on the join attributes. If there is a match they are joined.

  • $O$ is the amount of blocks in the outer loop.
  • $I$ is the amount of blocks in the inner loop.
  • $b$ is the buffer space.

$\text{I/O} = O + \lceil \frac{O}{b - 2} \rceil * I$

$\lceil \frac{O}{b - 2} \rceil$ is the number of iterations in the outer loop. For each outer iteration, the algorithm iterates through all the blocks in the inner loop. In addition, all the blocks in the outer loop have to be read once.

Single-loop join

If an index or hash key exists for one of the two join attributes, say attribute A of file X, then we retrieve each record of the other file with respect to the attribute. In our example let's say that is file Y with respect to attribute A. This is done with one single loop over file Y. Then we use the access structure (such as index or hash key) to retrieve directly all matching records from file X that satisfy x[B] = y[A], where x and y are records from X and Y respectively.

i = amount of index levels for the attribute you're joining on. r = amount of blocks one is looking for. b = amount of blocks over which these are spread.

$IO = b + (r ( i+3))$

Sort-merge join

Given that both records are physically ordered on the join attributes, then we can implement join the most efficient way possible. Both files are scanned concurrently in order of the join attributes. Then we join the records that have a match on the join attributes. If the files are not sorted, they have to be sorted with some external sorting.

If we have two files of size $b_1$ and $b_2$ blocks respectively, the total number of disk accesses needed to join them is $n = b_1 + b_2$. For each record that must be sorted, add the number of disk accesses needed for external sorting (see The sort-merge algorithm for external sorting).

The sort-merge algorithm for external sorting

TL;DR:

  • $b$: Blocks to be sorted.
  • $n_{B}$: Available buffer blocks.
  • $n_{R} = \lceil\frac{b}{n_{B}}\rceil$: Number of initial runs.
  • $d_{m} = min(n_{B}-1, n_{R})$: Degree of merging.

Total number of disk accesses: $(2*b) + (2*b)*\log_{d{M}}(n_{R})$.

The long version: The sort-merge algorithm is the only external sorting algorithm covered in this course. The word external refers to the fact that the data is too large to be stored in main memory, and thus is stored on some external storage, such as a hard drive. The algorithm is very similar to the classic merge sort algorithm. We will look at the total number of disk accesses needed for the algorithm, as well as giving an explanation of why it is so.

Sort phase: We start with a file of size $b$ blocks. We have $n_{b}$ available buffers in main memory, each able to store one block from the disk. We read a portion of as many blocks as possible ($= n_{B}$) into the main buffers; each of these portions is called a run. Each run is sorted by some internal sorting algorithm, and the results are written to disk after sorting. The number of such initial runs is called $n_{R}$, and it is equal to the total size divided by total buffer space: $n_{R} = \lceil\frac{b}{n_{B}}\rceil$. Because each block has to be read from disk and then written back, the number of disk accesses needed for this phase is $2*b$, which we'll call $n_{SORT-PHASE}$.

Merge phase: When the algorithm has sorted each of the initial runs and written them to disk, it proceeds to a merge pass, which merges as many runs as possible. In this phase, one of the buffers has to be reserved for the result of the merging. The rest of the buffers are used to store the blocks that are to be merged. The computer reads the first blocks from $n_{B}-1$ runs into the buffers – unless we have fewer than $n_{B}-1$ initial runs, and in that case, $n_{R}$ blocks will be read into memory at a time. This number, the number of blocks being merged at a time, will be called the degree of merging, $d_{M}$. When the blocks have been read into memory, the computer proceeds to find the smallest number from each. When this has been found, it is removed from its buffer and put in the result buffer. When the result block is full, it is written to disk; when a buffer is empty, the next block from that sorted run is read into memory.

The number of merge passes: If we are not able to merge all runs in one merge pass, we might have to do another … and perhaps then another. If you've taken a course in either discrete mathematics or algorithms, you will recognize that this merge phase can be illustrated as a full $d_{M}$-ary tree – each leaf node being an initial run and each internal node the result of a merge. The height of any full $n$-ary tree is decided by the number of leaves (here, the initial runs), which you take the logarithm of, the logarithm having a base equal to the number of children each internal node can have. You'll probably see that this is the number of runs that can be merged in each phase, the degree of merging ($d_{M}$). Thus, the height of the merge tree will be $\log_{dM}(n_{R})$. Because each merge pass has to both read and write a number of blocks equal to the size of the initial file, each merge pass also takes $2*b$ disk accesses. Now, we are almost at the solution: $n_{MERGE-PHASE} = (2*b)*\log_dM(n_{R})$.

The total number of disk accesses is thus: $$n_{SORT-MERGE} = n_{SORT-PHASE} + n_{MERGE-PHASE} = $$ $$(2*b) + (2*b)*\log_{dM}(n_{R})$$

Partition-hash join

Uses a single hash-function to partion both files, then comparing on the "hash buckets."

We'll explain this with an example. Let's say we have two files X and Y and join attributes A and B respectively, and X has fewer records than Y.

Partition phase: First, we have one single pass through file X and hash its records with a hash function h. Let's say X is small enough to fit all the hashed records into memory. Now the records with the same hash h(A) is in the same hash bucket.

Probing phase: We have a single pass through the other file Y and hash each record using the same hash function h(B) to probe that bucket, and that record is joined with all the matching records from X in that bucket.

The general case partition hash-join (when the smallest file does not fit entirely in the main memory) is not a part of the curriculum (as of May 2014).

Primitive operators

Projection (π) The projection of a relation R on a set of attributes A is the set of all tuples in R restricted to the set of attributes in A. Basically the SELECT of relational algebra.
Selection (σ) A selection with the set of conditions C selects all the tuples in a relation R such that the conditions in C hold. Basically the WHERE of relational algebra.
Rename (ρ) Renames attributes in a relation. Basically the AS of relational algebra.
Natural Join (*) Joins two relations R and S. The result of the join is the set of all combinations of tuples in R and S that are equal on their common attribute names. Basically the INNER JOIN of relational algebra.
Theta Join (⋈) Joins on a specific condition.

Andre relevante dokumenter:

  • Kompendier skrevet av enkeltpersoner
  • Forelesningsnotater fra kapittel 16 til kapittel 24
  • "How to fake database design"
  • "Simple Conditions for Guaranteeing Higher Normal Forms in Relational Databases", eller snarvei til femte normalform
  • "Method for determining candidate keys and highest normal form of a relation based on functional dependencies"

Videoforelesninger

Bortsett fra første video er det bare Svein Erik Bratsberg sine forelesninger som er filmet.

https://video.adm.ntnu.no/openVideo/serier/5124992265fdb

  1. Normaliseringsteori (Roger Midtstraum)
  2. Intro del 2: læringsmål. Fellesprosjektrelatert.
  3. ...
  4. ...
  5. Lagring og indekser
  6. ...
  7. ...
  8. ...
  9. ...
  10. ...
  11. Transaksjoner
  12. Transaksjoner
  13. Transaksjoner: recoverability
  14. ...
  15. Recovery
  16. ...

Eller videoer fra V2019: https://mediasite.ntnu.no/Mediasite/Catalog/catalogs/tdt4145_v19

Written by

Stian Jensen cristea duvholt odd eirikvaa agavaa hambro Anddor ek jonasft oyvindrobertsen Kaldahl Martin Hallén Vages thormartin91 trmd magikarpWhisperer Esso Skanin ilsegv liangzh ばかへんたい Ose magnusnn sawri blixhavn eddern
Last updated: Thu, 11 Apr 2019 10:42:54 +0200 .
  • Contact
  • Twitter
  • Statistics
  • Report a bug
  • Wikipendium cc-by-sa
Wikipendium is ad-free and costs nothing to use. Please help keep Wikipendium alive by donating today!