TTM4110: Pålitelighet og ytelse med simulering
The purpose of this course is to give an introduction to the conceptual and theoretical fundamentals of dependability and performance of ICT systems. Mathematical and software tools that can be used to analyse and dimension systems and network solutions are presented and basic isssues are discussed.
Functional and non-functional properties
Techniacal systems are described using two types of characteristics:
Functional properties: Which functions are performed.
Non-functional properties: How well these functions are performed.
These terms can be used to describe systems, let's say a car: The primary function of a car is to transport people and goods from one place to another. Non-functional properties include the carrying capacity, the maximum speed, whether the car starts when needed, whether it transports people and goods to destination without breaking down or causing accidents. These are the performance (carrying capacity, speed) and dependability (the car starts and fulfills its function) properties of the system. Along with the price they are rational arguments for comparing different designs.
When designing an ICT-system, the focus is mainly on the functions the system shall provide and how to implement them. However, in a real system, it is also crucial to concider non-functional properties such as dependability and performance. The non-functional requirements will have an impact on both the design and the cost of the system and determine its usability.
Every dependabilityand performance evaluation of a system, either baded on mathemactical analysis, simulation or measurements, always relies on a model of the system.
Model: A model is an abstraction of the real or projected system.
There is no standard recipe to elaborate a good model of a system and trade-offs must be made. On one hand a model must include sufficient details to represent the system, but on the other hand less important details must be left out to enable simulation in a reasonable time. Assumptions must be made so that the model can be expressed analytically, but at the same time one must ensure that the model still describes the real world. The results derived from a model should be valid for the real system, not just for the model.
The description of the system itself and the identification of what to include i nthe model are essential, butn ot sufficient. How the environments influence the system and vice versa must also be concidered.
A system can be defined as a regularly interacting or interdependent group of items forming a unified whole, where an item may be a sytem, a subsystem, or an atomic component. When performing a dependability and/or performance evaluation of a system, the aim is to identify the items (system components) that limit the dependability and/or performance. The structure of the system reflects how these components interact. The interactions themselves are referred to as the behavior of the system.
An ICT-system is composed of system components of diffferent types, for instance: processors (with processing capacity), hard disks (with storage capacity), transmission channels (with transmission capacity) and so on. These system components and their cpacities are the resources in an ICT-system. The amount of resources limits the system dependabilit and performance. The resources are what is utilized when the system is used.
The sturcture of a system indicates how the resources in the system must or should be utilized in order to deliver the service of which the dependability and&or performance properties are evaluated. The structure of a system may be physical, logical, or derived from the physical and/or logical structures.
It is necessary to simlify and make an abstratction of the behavior of the real system. Important aspects of the behavior that may be included in a model may be:
Queueing diciplines: What to do when all resources in the system is busy and someone/something new wants to use the sysem.
Protocols: Provides rules for the different entities in a system/network so they can cooperate
Traffic mechanisms apart from protocols: Routing algorithms, CAC, UPC and so on.
Fault-handling: Mechanisms which encompass error detection, localization, isolation, and various tchniques to provide fault tolerance and automatic and manual fault removal (repair). Important to include the possibility that the system doe not behave as intended, e.g. that an error in the system is not detected.
Cencepts and terminalogy
This section introduces concepts and terminology related to dependability and performance.
Quality of service
Quality of service (QoS): Degree of comliance of a service to the agreement that exists between the user and the provider of this service.
In this context, a user is an entity that uses a service provided by another entity, but is not necessarily and end-user of the service. A service is a set of functions (service primities) that are offfered on an interface between the user and the provider (not necessarily a physical interface). A Qos parameter is a (random) variable that characterizes the service.
Dependability: Trustworthiness of a system such that reliance can justifiable be placedo nthe service it delivers.
Dependability is a high-level concept. In addition to the dependability attributes which will be discussed later, depndability also encompasses the impairments that could affect the trustworthiness of a system, and the means to attain dependability.
Failure: Deviation of the delivered service from the compliance with the specification. Transition from correct service to incorrect service (e.g. the service becomes unavailable.
Error: Part of the system state which is liable to lead to a failure.
Fault: Adjudged or hypothesized cause of an error.
Example: Anelectromagnetic pulse (fault) results in flipping of a bit in a data register (error). When this register is accessed, a wrong result is returned to the user (failure). Another example is the software engineer who writes an incorrect code and thereby introduces a logical fault into a software module. This incorrect code is a (dormant) fault embedded in the system. Certain input values will activate the fault and there will be an error in the service.Services delivered by the software module may later crash or produce incorrect output (failure).
Two basic approaches to achieve a dependable system:
fault prevention, i.e. to prevent the occurrance or introduction of faults.
fault tolerance, i.e. to prevent that errors cause failures orin other words, to deliver a correct service despite the presense of faults.
Types of faults: Physical faults (physical wear on components), transient faults (only present for a short period of time),inermittent faults (faults come and go), design faults (Human made faults during specification, design and implemntation of a system), interaction or operational faults (Accidental fualts made by humans operating or maintaining a system), and environmental faults (faults originating outside the system bounderies).
Avaialbility: Ability of a system to provide a set of services at a given instant of time or at any instant within a given time interval.
Reliability: Ability of a system to provide uninterrupted service.
Safety: Ability of a system to provide service without the occurance of catastrophic failures.
Performance: Ability of a system to provide the resources needed to deliver its services.
Capacity: Maximum load a system can handle per time unit.