TPK4120: Safety and Reliability Analysis
Note: This is more a summary of the book rather than the lectures. Only chapters required are mentioned.
Basic concepts
Basic expressions and formulas...
Estimated failure rate
If the failure rate is constant:
Probability that component 1 fails before component 2:
Reliability
The main concept of the course is reliability. This is defined somewhat as "the probability that an item will perform a required function under stated conditions for a stated period of time.". This involves components, systems, subsystems such as hardware, software and even humans.
Quality
By quality we mean the totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs. Quality of a product is not only characterized by the conformity to specifications at the time it is supplied to the user, but the ability to meet these specifications during the entire lifetime.
Availability
The ability of an item to perform its required function at a stated instant of time or over a stated period of time. We can denote this as
Failure Models
Four important measures for the reliability of a nonrepairable item:
- The reliability (survivor) function
$ R(t) $ - The failure rate function
$ z(t) $ - The Probability density function
$f(t)$ - The mean time to failure (MTTF)
- The mean residual life (MRL)
The Reliability Function
The reliability function of an item is defined as:
Hence
Failure Rate Function
The probability that an item will fail in the interval
Dividing by the length of the time interval gives us
Giving, where
From this on, we can deduce different relationships between the functions
Expressed | ||||
- | ||||
- |
||||
- |
MTTF
Mean Time To Failure
if
else: MTTF =
Markov
Availability:
Mean number of failures per time unit:
Safety Instrumented Systems
Main System Functions
- When a predefined prcess demand (deviation) occurs in the EUC(Equipment Under Control), the deviation shall be detectetd by the SIS sensors, and the required actuationg items shall be activated and fulfill their intended functions
- The SIS shall not be activated spuriously, that is, without the presence of a predefined process demand in the EUC
Subsystem Components
- A sensor subsystem that shall detect a specified hazardous event or deviation (In the current case, the sensor subsystem comprises three gas detectors)
- A logic solver subsystem that interprets the finals from the sensor subsystem and sends an action signal to the final element subsystem 3
- A final element subsystem that shall take action upon signal from the logicsolver (Inthecurrent case, the final element subsystem comprises a single shutdown valve, the ESDV)
Failure Classification
Traditional Classification
Dangerous | Safe | |
Undetected | Dangerous undetected (DU): Dangerous failures are preventing activation on demand and are revealed only by testing or when a demand occurs. DU failures are sometimes called dormanr failures. | Safe undetected (SU) Nondangerous failures that are not detected by automatic self-testin |
Detected | Dangerous detected (DD): Dangerous failures that are detected immediately when they occur, for example, by an automatic, built-in self-test. The average period of unavailability due to a DD failure is equal to the mean downtime, MDT, that is, the mean time elapsing from the failure is detected by the built-in self-test until the function is restored. | Safe detected (SD): Non-dangerous failures that are detected by automatic self-testing. In some configurations early detection of failures may prevent an actual spurious trip of the syste |
classificated based on cause
-
Random hardware failures. These are physical failures where the supplied service deviates from the specified service due to physical degradation of the item. Random hardware failures can further be split into:
- Aging failures. These failures occur under conditions within the design envelope of the item. Aging failures are also called primary failures
- Stress failures. These failures occur due to excessive stresses on the item. The excessive stresses may be caused by external causes or by human errors during operation and maintenance. Stress failures are also called secondary failures.
-
systematic failures. These failures are nonphysical failures where the supplied service deviates from the specified service without any physical degradation of the item. The failures can only be eliminated by a modification of the design or of the manufacturing process, operational procedures, or documentation. The systematic failures can further be split into:
- Design failures. These failures are initiated during engineering, manufacturing, or installation and may be latent from the first day of operation. Examples include software failures, sensors that do not discriminate between true and false demands, and fire/gas detectors that are installed in a wrong place, where they are prohibited from detecting the demand.
- Interaction failures. These failures are initiated by human errors during operation or maintenancehesting. Examples are loops left in the override position after completion of maintenance and miscalibration of sensors during testing. Scaffolding that cover up a sensor making it impossible to detect an actual demand is another example of an interaction failure.