Table of Contents

What Goes Around Comes Around
Architecture of a Database System
Spørreoptimalisering
Parallelle og distribuerte databaser
Skyline
RankJoin
Kolonnebaserte databasesystemer
LSM
Data Streams Management
Storm
Bursty Time Series

‹

TDT4150: Avanserte Databasesystemer

Tags:

Ting er skrevet på en blanding av norsk og engelsk

What Goes Around Comes Around

Architecture of a Database System

Introduction

Hovedkomponentene i et RDBMS

Hovedkomponenter i et RDBMS.

Livet til en spørring

Aksess til API som gjør kall over nettverk for kobling med Client Communication Manager(CCM)
- autentisering, oppretting av tilstand, etc
- Two-tier/client-server, three-tier (middle-tier), application server
Ved mottak av SQL-kommando må DBMS-en tilordne en ”thread of computation”. Det må også sørges for at trådens data og kontrollutputt er koblet til klienten via communication manager. Dette er jobben til Process Manager (PM)
- Admission control: prosessering nå eller vente på ressursar
Relational Query Processor (RQP) sjekker om bruker er autorisert til å kjøre spørringen og kompilerer SQL-spørringen til en spørreplan. Den kompilerte spørreplanen blir håndtert av plan executor. Plan executor-en består av en pakke med “operators” for å utføre en spørring. Typiske operatorer inkluderer prosseseringsoppgaver som joins, selection, projection, aggregation, sorting o.s.v., samt kall for å requeste data records fra lavere lag i systemet.
Operatorar gjør kall til Transactional Storage Manager (TSM), som styrer alle data aksess- (read) og manipulerings- (create, update, delete) kall. Nødvendige låser tilegnes for å sikre korrekt utførelse ved spørringer som skjer samtidig. Ved oppdatering vil den ha interaksjon med låse-handterer for å sørge for at transaskjonen var durable hvis den ble commited, og fullstendig undone hvis den ble avbrutt.
Data blir returnert til klienten. Det skjer ved å gå baklengs gjennom aktivitetene som er beskrevet til nå.

Process Models

Operating System Process

An Operating System Process combines an operating system (OS) program execution unit (a thread of control) with an address space private to the process. Included in the state maintained for a process are OS resource handles and the security context. This single unit of program execution is scheduled by the OS kernel and each process has its own unique address space.

Operating System Thread

An Operating System Thread is an OS program execution unit without additional private OS context and without a private address space. Each OS thread has full access to the memory of other threads executing within the same multithreaded OS Process. Thread execution is scheduled by the operating system kernel scheduler and these threads are often called “kernel threads” or k-threads.

Lightweight Thread Package

A Lightweight Thread Package is an application-level construct that supports multiple threads within a single OS process. Unlike OS threads scheduled by the OS, lightweight threads are scheduled by an application-level thread scheduler without kernel scheduler involvement or knowledge. Lightweight threads have the advantage of faster thread switches when compared to OS threads since there is no need to do an OS kernel mode switch to schedule the next thread. Lightweight threads have the disadvantage, however, that any blocking operation such as a synchronous I/O by any thread will block all threads in the process. Generally, lightweight threads offer a more difficult programming model than writing software based on either OS processes or OS threads.

DBMS Client

A DBMS Client is the software component that implements the API used by application programs to communicate with a DBMS.

DBMS Worker

A DBMS Worker is the thread of execution in the DBMS that does work on behalf of a DBMS Client. A 1:1 mapping exists between a DBMS worker and a DBMS Client: the DBMS worker handles all SQL requests from a single DBMS Client. The DBMS client sends SQL requests to the DBMS server. The worker executes each request and returns the result to the client.

Process per DBMS Worker / Process per Connection

++ OS handterer tidsdeling av DBMS workers
++ DBMS programerer kan stole på OS-beskyttelsesfasiliteter
++ Enkel å implementere
-- Ikke skalerbar
-- Nødvendig med in-memory data-strukturer som er delt på tvers av DBMS-forbindelser. In practice the required extensive use of shared memory in this model reduces some of the advantages of address space separation, given that a good fraction of “interesting” memory is shared across processes.

Thread per DBMS Worker / Server Process

In the thread per DBMS worker model a single multithreaded process hosts all the DBMS worker activity. A dispatcher thread (or a small handful of such threads) listens for new DBMS client connections. Each connection is allocated a new thread. As each client submits SQL requests, the request is executed entirely by its corresponding thread running a DBMS worker. This thread runs within the DBMS process and, once complete, the result is returned to the client and the thread waits on the connection for the next request from that same client.

++ Effektiv
++ Skalerer bra for mange samtidige forbindelser
-- OS-et beskytter ikke trådene fra hverandres minne,
-- Debugging er vanskelig (spesielt for race conditions)
-- Vanskelig å porte mellom operativsystem p.g.a. forskjellig trådvarianter
-- Avhengig av asynkron I/O (uproblematisk på nye OS)

Process Pool

Som i prosess per DBMS worker, men OS håndterer tidsdeling. This model is a variant of process per DBMS worker. With process pool rather than allocating a full process per DBMS worker, they are hosted by a pool of processes. A central process holds all DBMS client connections and, as each SQL request comes in from a client, the request is given to one of the processes in the process pool. The SQL Statement is executed through to completion, the result is returned to the database client, and the process is returned to the pool to be allocated to the next request. The process pool size is bounded and often fixed. If a request comes in and all processes are already servicing other requests, the new request must wait for a process to become available. Process pool is often implemented with a dynamically resizable process pool where the pool grows potentially to some maximum number when a large number of concurrent requests arrive. When the request load is lighter, the process pool can be reduced to fewer waiting processes.

++ har alle fordelene til process per DBMS worker
++ mer minne-effektiv enn process per DBMS worker fordi et mye mindre antall prosesser er nødvenig Ofte anbefalt der en har svært mange brukere tilkoblet samtidig

Server Process + I/O Processes (er på foilene og den gamle artikkelen)

Reduserer problemet med manglande støtte for asynkron I/O i OS-et (og dermed ikkje så viktig lenger) Bruker egne I/O-prosesser for å håndtere disk En prosess per disk to ensure that the system can handle multiple requests to separate devices in parallel.

Oppsummering av modellene

All models described above aim to execute concurrent client requests as independently as possible. Yet, full DBMS worker independence and isolation is not possible, since they are operating on the same shared database. In the thread per DBMS worker model, data sharing is easy with all threads run in the same address space. In other models, shared memory is used for shared data structures and state.

Parallellitet

Shared memory

A shared-memory parallel system is one in which all processors can access the same RAM and disk with roughly the same performance. Multi-core processors support multiple processing cores on a single chip and share some infrastructure such as caches and the memory bus. This makes them quite similar to a shared-memory architecture in terms of their programming model. Today, nearly all serious database deployments involve multiple processors, with each processor having more than one CPU. - Prosessmodell som for uni-prosessor - Hovedutfordring: utnytte flere CPU-er til utføring av spørring

Shared nothing

A shared-nothing parallel system is made up of a cluster of independent machines that communicate with each other. Shared-nothing systems provide no hardware sharing abstractions, leaving coordination of the various machines entirely in the hands of the DBMS.
The most common technique employed by DBMSs to support these clusters is to run their standard process model on each machine, or node, in the cluster. The main difference, compared to a single shared memory system, is that each system in the cluster stores only a portion of the data. The tables are spread over multiple systems in the cluster using horizontal data partitioning to allow each processor to execute independently of the others. Each tuple in the database is assigned to an individual machine, and hence each table is sliced “horizontally” and spread across the machines.
Each individual machine is responsible for the access, locking and logging of the data on its local disks.
The query executors on the various machines ship data requests and tuples to each other, but do not need to transfer any thread state or other low-level information. As a result of this value-based partitioning of the database tuples, minimal coordination is required in these systems.
Good partitioning of the data is required for good performance. This places a significant burden on the Database Administrator (DBA) to lay out tables intelligently, and on the query optimizer to do a good job partitioning the workload.
The processors must exchange explicit control messages for issues like distributed deadlock detection and two-phase commit This requires additional logic, and can be a performance bottleneck if not done carefully
Partial failure is a possibility that has to be managed in a shared-nothing system. In a shared-memory system, the failure of a processor typically results in shutdown of the entire machine, and hence the entire DBMS. In a shared-nothing system, the failure of a single node will not necessarily affect other nodes in the cluster. But it will certainly affect the overall behavior of the DBMS, since the failed node hosts some fraction of the data in the database.
- Løsning 1: bring down all nodes if one node fail
- Løsning 2: “Data skip” allow queries to be executed on any nodes that are up, “skipping” the data on the failed node
- Løsning 3: employ redundancy schemes
Svært skalerbar og kostnadseffektiv

Shared disk

A shared-disk parallel system is one in which all processors can access the disks with about the same performance, but are unable to access each other’s RAM.
One potential advantage of shared-disk over shared-nothing systems is their lower cost of administration. DBAs of shared-disk systems do not have to consider partitioning tables across machines in order to achieve parallelism. But very large databases still typically do require partitioning so, at this scale, the difference becomes less pronounced.
That the failure of a single DBMS processing node does not affect the other nodes’ ability to access the entire database.
However, even with these advantages, shared-disk systems are still vulnerable to some single points of failure. If the data is damaged or otherwise corrupted by hardware or software failure before reaching the storage subsystem, then all nodes in the system will have access to only this corrupt page.
Explicit coordination of data sharing across the machines is needed. Shared-disk systems depend upon a distributed lock manager facility, and a cache-coherency protocol for managing the distributed buffer pools. These are complex software components, and can be bottlenecks for workloads with significant contention.

NUMA

Non-Uniform Memory Access (NUMA) systems provide a shared memory programming model over a cluster of systems with independent memories. Each system in the cluster can access its own local memory quickly, whereas remote memory access across the high-speed cluster interconnect is somewhat delayed. NUMA hardware architectures are an interesting middle ground between shared-nothing and shared-memory systems. They are much easier to program than shared-nothing clusters, and also scale to more processors than shared-memory systems by avoiding shared points of contention such as shared-memory buses. NUMA clusters have never achieved any significant market share.

DBMS Threads and Multi-processors

The natural implementation of the lightweight DBMS thread package described in Section is one where all threads run within a single OS process. Unfortunately, a single process can only be executed on one processor at a time.

Relational Query Processor

A relational query processor takes a declarative SQL statement, validates it, optimizes it into a procedural dataflow execution plan, and (subject to admission control) executes that dataflow program on behalf of a client program. The client program then fetches (“pulls”) the result tuples, typically one at a time or in small batches. In general, relational query processing can be viewed as a single-user, single-threaded task. Concurrency control is managed transparently by lower layers of the system (storage manager). Spørreprosessering: 1) Parsing og autentisering, 2) Omskriving (rewriting), 3) Optimalisering, 4) Utføring

Storage Manager

Two basic types of DBMS storage managers are in commercial use today: either (1) the DBMS interacts directly with the low-level blockmode device drivers for the disks (often called raw-mode access), or (2) the DBMS uses standard OS file system facilities. This decision affects the DBMS’s ability to control storage in both space and time.

Spørreoptimalisering

SQL queries are converted into units called blocks. Blocks are translated into (extended) relational algebra expressions. The central task of an optimizer is to find a good plan for evaluating such expressions. Optimizing a relational algebra expression involves two basic steps:

Enumerating alternative plans for evaluating the expression. Typically an optimizer considers a subset of all possible plans because the number of possible plans is very large
Estimating the cost of each enumerated plan and choosing the plan with the lowest cost

Translating SQL queries into algebra

SQL queries are optimized by decomposing them into a collection of smaller units, called blocks. A typical relational query optimizer concentrates on optimizing a single block at a time.

A query block is an SQL query with no nesting and exactly one SELECT clause and one FROM clause and at most one WHERE clause, GROUP BY clause, and HAVING clause.

The first step in optimizing a query block is to express it as a relational algebra expression. Every SQL query block can be expressed as an extended algebra expression. The SELECT clause correspond to the projection operator, the WHERE clause correspond to the selection operator, the FROM clause correspond to cross-product relations, and the remaining clauses are mapped to corresponding operators in a straightforward manner.

A query is essentially treated as a σπx algebra expression, with the remaining operators (if any) carried out on the result of the σπx expression.

Estimating the cost of a plan

There are two parts to estimating the cost of an evaluation plan for a query block:

For each node in the tree, we must estimate the cost of performing the corresponding operation.
For each node in the tree, we must estimate the size of the result and whether it is sorted. This result is the input for the operation that corresponds to the parent of the current node, and the size and sort order in turn effect the estimation of size, cost and sort order for the parent.

Estimating the result size

Reduction factor

The maximum number of tuples in the result of a query is the product of the cardinalities of the relations in the FROM clause. Every term in the WHERE clause, however, eliminates some of these potential result tuples.We can model the effect of a where clause on the result size by associating a reduction factor with each term, which is the ratio of the (expected) result size to the input size considering only the selection represented by the term. The actual size can be estimated as the maximum size times the product of the reduction factors for the terms in the WHERE clause.

Histograms

A histogram is a data structure maintained by a DBMS to approximate a data distribution.

In an equiwidth histogram, we divide the range into subranges of equal size. We could also choose subranges such that the number of tuples within each subrange is equal. Such a histogram is called an equidepth histogram

Relational algebra equivalences

Algebra equivalences allow us to convert cross-products to joins, choose different join orders, and push selections and projections ahead of joins.

Enumeration of alternative plans

Given a query, an optimizer essentially enumerates a certain set of plans and chooses the plan with the least estimated cost. Not all algebraically equivalent plans are considered, because doing so would make the cost of optimization prohibitively expensive for all but the simplest queries.