TTM4110: Pålitelighet og ytelse med simulering
The purpose of this course is to give an introduction to the conceptual and theoretical fundamentals of dependability and performance of ICT systems. Mathematical and software tools that can be used to analyse and dimension systems and network solutions are presented and basic issues are discussed.
Chapter 1: Introduction
Functional and nonfunctional properties
Technical systems are described using two types of characteristics:
Functional properties: Which functions are performed.
Nonfunctional properties: How well these functions are performed.
These terms can be used to describe systems, let's say a car: The primary function of a car is to transport people and goods from one place to another. Nonfunctional properties include the carrying capacity, the maximum speed, whether the car starts when needed, whether it transports people and goods to destination without breaking down or causing accidents. These are the performance (carrying capacity, speed) and dependability (the car starts and fulfills its function) properties of the system. Along with the price they are rational arguments for comparing different designs.
When designing an ICTsystem, the focus is mainly on the functions the system shall provide and how to implement them. However, in a real system, it is also crucial to concider nonfunctional properties such as dependability and performance. The nonfunctional requirements will have an impact on both the design and the cost of the system and determine its usability.
Models
Every dependability and performance evaluation of a system, either based on mathemactical analysis, simulation or measurements, always relies on a model of the system.
Model: A model is an abstraction of the real or projected system.
There is no standard recipe to elaborate a good model of a system and tradeoffs must be made. On one hand a model must include sufficient details to represent the system, but on the other hand less important details must be left out to enable simulation in a reasonable time. Assumptions must be made so that the model can be expressed analytically, but at the same time one must ensure that the model still describes the real world. The results derived from a model should be valid for the real system, not just for the model.
The description of the system itself and the identification of what to include in the model are essential, but not sufficient. How the environments influence the system and vice versa must also be concidered.
Systems
A system can be defined as a regularly interacting or interdependent group of items forming a unified whole, where an item may be a sytem, a subsystem, or an atomic component. When performing a dependability and/or performance evaluation of a system, the aim is to identify the items (system components) that limit the dependability and/or performance. The structure of the system reflects how these components interact. The interactions themselves are referred to as the behavior of the system.
System components
An ICTsystem is composed of system components of different types, for instance: processors (with processing capacity), hard disks (with storage capacity), transmission channels (with transmission capacity) and so on. These system components and their capacities are the resources in an ICTsystem. The amount of resources limits the system dependability and performance. The resources are what is utilized when the system is used.
Structure
The structure of a system indicates how the resources in the system must or should be utilized in order to deliver the service of which the dependability and/or performance properties are evaluated. The structure of a system may be physical, logical, or derived from the physical and/or logical structures.
Behaviour
It is necessary to simplify and make an abstraction of the behavior of the real system. Important aspects of the behavior that may be included in a model may be:

Queueing diciplines: What to do when all resources in the system is busy and someone/something new wants to use the sysem.

Protocols: Provides rules for the different entities in a system/network so they can cooperate

Traffic mechanisms apart from protocols: Routing algorithms, CAC, UPC and so on.

Faulthandling: Mechanisms which encompass error detection, localization, isolation, and various techniques to provide fault tolerance and automatic and manual fault removal (repair). Important to include the possibility that the system does not behave as intended, e.g. that an error in the system is not detected.
Concepts and terminology
This section introduces concepts and terminology related to dependability and performance.
Quality of service
Quality of service (QoS): Degree of compliance of a service to the agreement that exists between the user and the provider of this service.
In this context, a user is an entity that uses a service provided by another entity, but is not necessarily an enduser of the service. A service is a set of functions (service primities) that are offered on an interface between the user and the provider (not necessarily a physical interface). A QoS parameter is a (random) variable that characterizes the service.
Dependability
Dependability: Trustworthiness of a system such that reliance can justifiably be placed on the service it delivers.
Failure: Deviation of the delivered service from the compliance with the specification. Transition from correct service to incorrect service (e.g. the service becomes unavailable.
Error: Part of the system state which is liable to lead to a failure.
Fault: Adjudged or hypothesized cause of an error.
Example: An electromagnetic pulse (fault) results in flipping of a bit in a data register (error). When this register is accessed, a wrong result is returned to the user (failure). Another example is the software engineer who writes an incorrect code and thereby introduces a logical fault into a software module. This incorrect code is a (dormant) fault embedded in the system. Certain input values will activate the fault and there will be an error in the service. Services delivered by the software module may later crash or produce incorrect output (failure).
Two basic approaches to achieve a dependable system:
fault prevention, i.e. to prevent the occurrance or introduction of faults.
fault tolerance, i.e. to prevent that errors cause failures, or in other words, to deliver a correct service despite the presence of faults.
Types of faults: Physical faults (physical wear on components), transient faults (only present for a short period of time), intermittent faults (faults come and go), design faults (Human made faults during specification, design and implementation of a system), interaction or operational faults (Accidental faults made by humans operating or maintaining a system), and environmental faults (faults originating outside the system bounderies).
Availability: Ability of a system to provide a set of services at a given instant of time or at any instant within a given time interval.
Reliability: Ability of a system to provide uninterrupted service.
Safety: Ability of a system to provide service without the occurrence of catastrophic failures.
Performance
Performance: Ability of a system to provide the resources needed to deliver its services.
Capacity: Maximum load a system can handle per time unit.
Throughput: Portion of the system capacity that is utilized by the users.
Delay: Time it takes to complete a service.
Waiting time: Accumulated time a service reuqest is pendingfor service in a system.(W)
Service time: Accumulaed time a service request is served by a system. (X)
Sojourn time: Total time a service request is in a system. (S = W + X)
Methods:
Mathematical analysis
In the case of a wellknown probelm, results are rapidly and easily obtained using formulas or algrotihms. Explicit parametric relations can be found for simple models, and parametric analysis, e.g. sensitivity analysis, is relatively easy.
Simulation:
Relatively detailed and realistic models can be used. Simulation allows o use system models with an arbitrary level of deatils. The challenge is to include all that is relevant for the evaluation but nothing more. A lower level of knowledge is required to design new models
Measurement on prototype of a system
Results are trustworthy since they are not affected by assumptions and simplifications. Results are realitic and detailed.
Chapter 2: Fundamentals of probability
Not completed yet, feel free to contribute here.
Expected value:
Sample mean (avarage):
Variance:
Sample variance:
Standard deviation:
Standard error:
Intensity:
Conditional probability:
Baye's formula:
Negative exponential distribution (n.e.d) (probability density function):
Erlang k distribution probability density function (k is shape and λ is rate of each phase):
and cumulative:
Death rate (failure rate):
Chapter 4: Simulation
Simulation
Imitative representaion of the functioning of one system or process by means of the functioning of another.
In the context of this course: A simulation is performed by means of a software program, called a simulator, running on a computer. By simulation, we really mean stochastic simulation where the behaviour of the system components is, fully or partly, governed by stochastic processes. To imitate a stochastic behavior, the simulator, and therefore the computer, has to be able to generate random variables according to a given probability distribution.
Random variable generation:
Simulation reilies on the generation of random variables, and therefore on the generation of random numbers. Generated random variables are called random variates. Random numbers may be generated either from a physical process or from an algorithm. The main drawback of generating random numbers from a physical process is that the sequence is not reproducible. Generating random numbers from an algorithm using a computer allows to generate a large amount of numbers and to reproduce the same sequence several times which enables to reproduce the same experiments. Random numbers generated by an algorithm are not truly random since the algorithm itself is deterministic and are therefore called pseudorandom numbers. However, the sequence of numbers generated should appear to be statisically random (independent and uniformly distributed random numbers).
Requirements: Fast (many samples), portable, long cycle period, reproducible sequences, good statistical properties.
Direct method:
Pro: Fast, empirical distribution
Con: Only applicable to discrete distributions.
Inverse Transform:
Pro: Fast
Con: Distributions require a closed form inverse transform.
AcceptanceRejection:
Pro: Many distributions.
Con: Many samples per random variate generated.
Convolution:
Pro: Simple implementation of complicated distributions.
Con: Only applicable to sum of variates.
Advantages and disadvantages of simulation
Simulation is a flexible and practical way of analyzing systems. Simulation allows to make models that include exactly the necessary level of granularity, not more not less. A simulation model is better tailored for a given study than a mathematical model or a real system. A mathematical model often needs to be prohibitevely simplified to be solved analytically. A real system used for a measurement study can obviously not be simplified at all.
A simulation model is often more challenging and more demanding with respect to the insight into the system and the understanding of its operation. One must be able to remove unnecessary details and at the same time understand and describe the details of importance for a given study. Hence, a common and important byproduct of a simulation study is an improved insight into the system and the identification of possible weaknesses in the design.
Types of simulation models
Simulation can be classified as static or dynamic, discrete or continuous, and deterministic or stochastic.
In this course we use dynamic, discrete, and stochastic simulation models.
Static simulation model
Also called MonteCarlo simulation models a particular point in time (time not part of model) E.g throwing a dice.
Dynamic simulation model
Model represents a system as it changes over time. E.g. simulate a bank from 08:0016:00.
Discrete simulation model
State variables change only at discrete set of points in time.
Continuous simulation model
State variable changes continuously over time.
Deterministic simulation model
Contains no random variables.
Stochastic simulation model
Contains at least one random input variable
Discrete event simulation (DES):
The system dynamics is given by discrete events. The system state is distinct and will only change at a specific time instance. Simulations of discrete events need only to simulate the events that change the system state, and a finite number of events will take place in a finite interval of time.
Time driven vs. eventdriven simulation:
Time driven: Fixed time stamp. For example: Move the clock 1 second forward and determine the system state.
Event driven: Variable timestamp. For example: Move the clock to the time of next event.
Concepts in Discrete event simulation:
Model: Abstract representation of system.
Entity: Object with explicit model description.
Attributes: Properties of an entity.
System: Collection of entities.
System state: Set of variables containing sufficient information.
List: Collection of associated entities.
Event: Change of system state.
Event notice: A record of an event to occur.
Event list: List of event notices (ordered by time)
Activity: Specified duration.
Delay: Unspecified duration.
Resource: Passive object used to perform an activity.
Process oriented simulation:
A Process describes the behaviour of an entity (sequence of activities, life cycle). Focus on the entities and their dependencies and interactions (objectoriented approach). Eventdriven approach. Requires a list of events and synchronization mechanisms.
Activity diagram:
A graphical representation of a model. Activites/lifecycles of entities. Interactions between entities (cooperation, interrupt, generation, priorities). Entities and resources (comptetition, synchronization).
Chapter 5: Traffic Models
This chapter presents the Poisson process and its properties. In particular, it describes how Poisson processes can be aggregated to form Markov processes that are an important part of Markov models.
Poisson and Markov Process
Poisson process: Regular point process (only one event at the time). Intensity is constant (homogenous PP). Intensity is independent of previous events.
Markov process: Events from many Poisson Processes.
Poisson distribution: Number of events until and including time t.
Exponential distribution: Time between two consecutive events.
Poisson process merging
The merging (superposition, multiplexing) of two Poisson processes is a Poisson process.
Splitting of a Posson process in n processes forms n Poisson processes if the splitting is probabilitstic (with constant probability).
Stationary traffic models
Examples of queuing systems
Phone subscirbers share a limited number of telephone lines in a telephone exchange
Mobile phone subscribers share a limited number of radio transmitters/receivers in a mobile base station.
Taxi call centers have delay queues.
IP routers have input and output buffers.
Maintenance centers have order lists and repairmen.
Performance: Delay, blocking, utilization, waiting time...
Queuing model
A system does not need to have any queuing positions to be classified as a queuing system.
A queueing system is characterized by the arrival process, service time distribution, number of servers, maimum number of customers in the system, number of sources, and queuing dicipline (service order).
Random extensions
Use of modeling in development and dimensioning.
TODO
Point process  Point process describes occurrence of an event  No gradation of type of event  The state, X(t), of a process is the number of events, N(t) Regular point process only one event at a time.
Time to k'th event:
$$ S_k = \sum_{i=1}^k Y_i $$ har også at$$ P(N(t)<k) = P(S_k>t) $$
sd
Renewal process Regular point processes Time to next event is independent of what happened in the past The superposition of merging two renewal processes is generally not a renewal process. However, the superposition of a large number of renewal processes tends to a Poisson process.
Little's formula:
Erlang: Erlang is a dimensionless unit that is used as a measure for offered load or carried load from serviceproviding elements. Often expressed as call arrival rate times average call length. Given a scenario with a single cord in telephone system. The cord can provide phone calls for 60 minutes per hour. A full utilization of the cord results in 1 erlang. If users now attempts to make calls in this system (measured in eg. per hour): Carried load is the traffic that was carried for the successful callattempts. Offered load is the traffic that would have been carried if all the callattempts succeeded.
Offered traffic (in erlangs) is related to the call arrival rate, λ, and the average callholding time (the average time of a phone call), h, by:
Relative frequency: Number of times the event occurs divided by the number of times an experiment is carried out
Probability: Asymptotic relative frequency of the event investigated
Joint probability:
The probability that both events occurs in conjunction
(Intersection)
Conditional probability: If A and B are two events, and P(B) > 0 P(AB) = P(A ∩ B) P(B)
Multiplicative rule P(A ∩ B) = P(AB)P(B)
Law of total probability: P(B) = P j P(B ∩ Aj) = P j P(BAj) · P(Aj)
Negative exponential distribution:
X ∼ Exp(λ)
Erlangk distribution: It is the distribution of the sum of k random variables identically distributed.
X ~ Erlangk(λ)
Binomial distribution:
Probability distribution of the number X of events 1 in a
sequence of n independent Bernoulli trials.
X ∼ Bin(n, p)
Poisson distribution:
Corresponds to the binomial distribution with an infinite number
of Bernoulli trials.
Probability of the number of occurrences of an event, given
that it occurs on average at the intensity
Poissoon process:
TODO: format
• Regular point process (only one event at the time) • Intensity is constant (homogeneous PP) • Intensity is independent of previous events The merging (superposition, multiplexing) of two Poisson processes is a Poisson process Splitting of a Poisson process in n processes forms n Poisson processes if the splitting is probabilistic (with constant probability)
Queueing model
Kendall's notations:
TODO: format
Examples Kendall’s notation, example: M/M/1 – M = Poisson arrival process – M = service time distribution is n.e.d. – 1 server – No system capacity given, i.e. infinite – No queuing discipline, i.e. arbitrary discipline can be used • Infinite server systems: M/M/∞ • Loss systems: M/M/n/n • Delay systems with infinite queue: M/M/n • Jackson’s queueing network: network of M/M/nj system
Dependability
Series system:
Availability:
Reliability function:
Parallel system:
Availability:
Reliability function:
Kofn system:
Availability:
Reliability function:
Mean Time to First Failure:
Mean Time to Catastrophic Failure:
Mean Time Between Failures:
Mean Up Time:
Mean Down Time:
Mean Time To Failure:
Mean Time to First Failure:
For a system that cannot be repaired and a structure of
independent elements with constant failure rate
How to improve system availability:
 Identify the most critical factor (element) in the system model
 Set element availability to 1
 Calculate the net change in system availability
 Repeat for all elements
 Which element has the larges impact on the system availability?
 What does it take to improve that element?
 Alternatives:
 Replacement of a component with lower failure rate
 Improve recovery time
 Change the structure by adding more
Cut set
Cut set
The system fails if all elements in the set fails Minimum cut set The system fails if and only if (iff) when all elements in the set fail, given that the system elements not in the set are working
Path set
Path set
The system is working if all elements in the set are working Minimum path set The system is working iff all elements in the set are working, given that all system elements not in the set are failed.
Example:
Selftest 1:
Question 1
Question 2
The main purpose of a protocol is to provide rules for the different entities in a system/network to cooperate. In other words a protocol dictates how various system parts interact with eachother, and is thus part of the behaviour property of the system.
Question 3
Definition from the book: Quality of Service (QoS) is the degree of compliance of a service to the agreement that exists between teh user and the provider of this service.
Question 4
Question 5
Question 6
Question 7
Question 8
Question 9
Question 10
Question 11
Question 12
Question 13
Question 14
Question 15
Question 16
Question 17
Question 18
Individual probabilities must be between 0 and 1. The sum of all probabilities must equal 1.
Question 19
Question 20
Question 21
15/500 = 0.03