Reliability Design of Complex Systems - Modeling and Efficient Simulation
Reliability is an important non-functional requirement of many man-made systems, especially when failures may lead to catastrophic events. When such systems are too complex to be understood and designed by one person, the resulting effect of local design decisions on overall system properties are not obvious. Mathematical models can help to describe such systems and to compute their reliability with the help of appropriate software tools.
Unavoidable faults may be masked or tolerated by static or dynamic redundancy measures, all at a considerably increasing cost. The main task is to design a system such that its reliability and safety requirements are achieved with the least amount of resources. Classic models and tools for static analysis are not able to cover systems in which the complex behaviour influences failures, or if dynamic reconfigurations are applied (possibly because of a better resource / reliability trade-off).
Depending on the complexity of the system behaviour and the corresponding size of the state space, Markov chains and stochastic Petri nets are applied to reliability problems. They are attractive models as long as the underlying assumption of a Markov behaviour is realistic (Phase-type distributions can emulate others up to a certain accuracy, but this is paid for with an even larger state space). Petri nets have been adopted as a suggested tool for reliability engineering of complex systems in an international standard recently.