Technical Report
Formal Methods in Resilient Systems Design using a Flexible Contract Approach
-
Systems Engineering and Systems Management Transformation
Report Number: SERC-2018-TR-119
Publication Date: 2018-12-21
Project:
Formal Methods in Resilient Systems Design using a Flexible Contract Approach
Principal Investigators:
Dr. Azad Madni
Co-Principal Investigators:
Dr. Barry Boehm
Dan Erwin
Abstract:
Resilience, a non-functional property, allows a system or system of systems (SoS) to continue to provide useful service in the face of disruptions (Neches and Madni, 2011). Disruptions can be external, systemic, or human-triggered (Madni and Jackson, 2009). Examples of disruptions in the operational context of multi-UAV swarms include a hacked or compromised swarm member, loss of communication within the swarm or between specific swarm members, and loss of visibility due to extreme weather or sensor malfunction. Resilient responses to such disruptions can take a variety of forms depending on environment observability and available intelligence. These include: circumvent disruptions if they can be anticipated; withstand disruption if within designed performance envelope; and recover rapidly from the negative effects of disruptions outside the performance envelope. Practically speaking, this means dynamically extending system capacity to cope with disruptions; restructuring or reconfiguring system pursuant to disruptions; and continuing to operate at somewhat diminished but acceptable level. The system’s design envelope includes system models and adaptation logic incorporated within the system model to produce the requisite resilient responses when such disruptions occur. Doyle (2016) defines resilience as the ability to recognize unanticipated perturbations that fall outside the system’s design envelope. This definition implies that resilience is concerned with monitoring the boundary conditions of the system’s model for competence (how well resilience strategies match disruption demands), and then adjusting or expanding that model to better accommodate changing demands (Neches and Madni, 2011). The key issue here is assessing an organization’s adaptive capacity (i.e., resource buffers that allow resources of a particular type to be increased on demand to a maximum limit) relative to the challenge posed by the disrupting event to that adaptive capacity. Boundaries in the multi-UAV swarm context define the system’s competent performance envelope relative to specific classes of disruptions and uncertainties. Therefore, resilience engineering in a certain sense is concerned with introducing transparency into an organization’s safety model with the purpose of determining when the model needs to be revised. In other words, resilience engineering is concerned with monitoring a system’s decision making with a view to assessing the system’s risks and risk envelope relative to unsafe operating boundaries.
Risk monitoring implies proactive and automatic/semi-automatic monitoring of buffers, margins, and tolerances. Buffer capacity is concerned with the magnitude and type of disruptions a system can absorb or adapt to without a substantial degradation in system performance or breakdown in integrity of system structure. Flexibility is the ability of a system to restructure or reorganize itself in response to external changes or pressures (Madni, 2009). Margin is the proximity of a system’s operation regime relative to its designed operational performance envelope or boundary. Tolerance is the ability of a system to degrade gracefully (as opposed to collapsing) as stress/pressure increases, or when disruption magnitude and/or severity exceeds its adaptive capacity.
This paper presents a model-based approach that combines formal and probabilistic modeling to engineer resilient system and verify their designs.