Technical Report
Methods to Evaluate Cost/Technical Risk and Opportunity Decisions for Security Assurance in Design
-
Trusted Systems
Report Number: SERC-2020-TR-005
Publication Date: 2020-06-12
Project:
Methods to Evaluate Cost/Technical Risk and Opportunity Decisions for Security Assurance in Design
Principal Investigators:
Thomas McDermott Jr.
Co-Principal Investigators:
Dr. Cody Fleming
This research addresses needs defined by the Office of the Undersecretary of Defense for Research and Engineering (OUSD/R&E), Strategic Technology Protection and Exploitation (STPE) Division to develop standard approaches to “design in” security and resilience for current and future weapon systems. The proposal closely aligns with the OUSD/R&E’s Digital Engineering Strategy (DES) and the Cyber Resilient Weapon Systems (CRWS) initiative. It extends ongoing research in Security Engineering within the Systems Engineering Research Center (SERC) to a broader definition of system assurance. It addresses a gap in current systems engineering methods, processes, and tools (MPTs) associated with early-phase requirements assessment in Cyber Resilience system trades. Our STPE research sponsors are specifically interested in developing new standard approaches that combine security assurance and safety assurance (as well as other assurance concerns) in a common, model-based, systems engineering process. This integrated view is shown in Figure 1 in the report.
This research responds explicitly to sponsor desires to leverage relationships between system safety and systems engineering to improve system security and resilience. System safety has a history of successfully integrating practice into the systems engineering process to enable more interdisciplinary collaboration and better-informed trades [1]. Reed and McEvilley define a working definition of synergistic safety and security as “Freedom from those conditions that can cause death, injury, or occupational illness; damage to or loss of equipment or property; damage to the environment; damage or loss of data or information; or damage or loss of capability, function, or process.” Loss scenarios, assurance claims, goals, and resulting safety/security requirements and constraints are used in the combined evaluation of safety and security evidence in the design process. Assurance claims are system attributes evaluated in system engineering trades. The resulting system design must follow methodologies that consider need, design and evaluation rigor, and return on investment.
Security, safety, and resilience (and associated dependability attributes of systems) can be explored in an integrated process focused on concepts of loss. A system’s resilience is its ability to avoid loss, withstand disruptions that may result in loss, recover from these disruptions, and adapt to internal and external events that may cause disruption [2]. In this context, system assurance is a loss-driven methodology for identifying and evaluating resilience alternatives and balancing the effectiveness and affordability of system design alternatives. It considers four modeling goals:
- a model of the system and its mission, operational tasks, behaviors, and structure
- modeling the concept of maximum reasonable assurance – the decision process that considers system performance safety, security, dependability, and associated characteristics, and determines the appropriate responses to malicious and non-malicious disruption to the system that could result in losses
- models that capture engineering rigor – the engineering methods and processes that support the specification, architectural definition, design, analysis, and verification & validation of the system, and
- creation of a system resilience model – a model that communicates the system, the threats to the system (disruptions and resultant losses), the assurance decisions (requirements and constraints), and the countermeasures (design decisions and resilience modes) added to the system model.
The research focused on capturing all four modeling goals in a consistent environment using Model-Based Systems Engineering (MBSE) methods, processes, and tools. A primary outcome of this research is the development and maturation of a meta-model capturing central concepts of a system (operations, function, structure, requirements), assurance (loss, loss effect, and loss scenario), and resilience design (functions that avoid, withstand, recover, and adapt) into MBSE tool constructs. Figure 2 in the report summarizes the process goals for the research.
For several years, a principal focus of the Trusted Systems research thrust within SERC has been developing methods and tools that support system design for cyber resilience in cyber physical systems. This body of work features the development of the Mission Aware (MA) framework for integration and alignment of cyber engineering requirements with the system development lifecycle and systems engineering processes. MA includes techniques to evaluate cyber physical system threats and attacks, requirements and design concepts for cyber resiliency, and model-based tools for selecting resilient architectures. The MA framework’s centerpiece is a risk analysis that integrates the perspectives of mission owners, systems engineers, and adversary red teams into a common model-based form.
MA was developed through a series of SERC research efforts, notably RT-156, RT-172, RT-191, RT-196, WRT-1013, and this effort: ART-004. WRT-1013 developed a meta-model that can be used to derive model-based systems engineering (MBSE) representations of systems [4] [5] [6] [7]. The meta-model includes loss scenarios, hazards, threat activities, system resilience modes of operation, and control-driven representations of security requirements. The meta-model captures the results of a standard Cyber Security Requirements Methodology (CSRM) intended to be conducted through the early stages of system definition and development, which was matured in RT-191 and RT-196. The MA Meta-Model was demonstrated in a current-generation MBSE software suite. ART-004 extends this work to a formal methodology for assurance case reasoning in resilient cyber design that can be standardized across the DoD Mission Engineering and system definition phases of a weapon system.
Research Goals and Results:
The research goals center on two primary questions:
- Can we define a standard methodology to integrate cyber resilience analysis into systems engineering activities building from the success of safety engineering activities?
- Can we define a framework for decision metrics that consider both the cyber threat and system model to inform tradespace analysis of the system resilience model?
Figure 3 in the report shows at a high level, the strategy of the project to extend previous SERC MA research to the two central research questions.
The research produced several contributions to the fields of safety, security, and resilience:
- Development of a candidate approach for “loss-driven systems engineering.”
- Assessment of the application of different assurance standards and modeling methods in consideration of a system's combined safety and security characteristics
- Exploration of previous research on model-based system assurance and its ability to extend to more complex systems-of-systems.
- Standard means using a Conops format and models to express concepts of threat, resilience, safety, and assurance from the mission level down to design, and from the operational view to the engineering views.
- Development of a metrics framework that links together threat motivation with system loss trades that can be expressed as decision metrics at multiple levels of the system.
- Demonstration of the approach in a publicly accessible modeling case study.
Loss-Driven Systems Engineering
The systems engineering community seeks to formalize an approach to address the potential for loss and associated effects resulting from developing and employing an engineered system. While much of systems engineering focuses on the delivery of desired capabilities, loss-driven systems engineering addresses potential losses associated with the system of interest. Loss-driven systems engineering is directed by several specialty engineering areas: safety, security, operational risk, resilience, protection, recovery, reliability, and other system ‘ilities. The potential for loss associated with a system is currently addressed independently by these different specialty engineering areas. System attributes such as resilience and infrastructure protection have a common association with these specialty areas through the concept of loss and associated loss impacts. These are shown in Figure 4 in the report. Systems architecting and specialty engineering practices share many commonalities and synergies around how loss and related effects are addressed through requirements, architecture, design, analytics, modeling, simulation, and verification. In particular, the concepts of loss, loss effect, and associated loss scenarios use common abstractions at all phases and levels of the systems engineering process, from mission engineering to detailed design, and from the concept of operations to verification and validation.
The goal of capturing all of these specialty perspectives in an integrated architecture model using MBSE tools is a crucial outcome of this research.
Integrating assurance standards and modeling methods
Assurance, as defined, is grounds for justified confidence, gained before depending on a system, that a claim about dependability, safety, or security has been (or will be) achieved. A claim is a true-false statement about one of these properties of a system [8]. Assurance is related to the “requirements of a property of a system.” As defined by NATO, system assurance is the justified confidence that a system functions as intended and is free of exploitable vulnerabilities, either intentionally or unintentionally designed or inserted as part of the system at any time during the life cycle. System “functions as intended” and “free of exploitable vulnerabilities” represent the system's highest-level properties [9]. The cybersecurity community has focused too firmly on exploitable vulnerabilities. It needs a much more rigorous approach to gain confidence that the system functions as intended in the presence of external threats. This confidence is achieved by system assurance activities, including a planned, systematic set of multi-disciplinary activities to meet the acceptable measures of system assurance and manage the risk of exploitable vulnerabilities. One can argue that “functions as intended,” for systems of any complexity, requires a modeling method that relates system function to the requirements properties of a system that define its dependability, safety, and security. These properties can be constraints on the system function or additional system functions that support assurance activities. The CSRM and MA Meta-model, as defined in this research, provide a standardized approach to link modeling, assurance cases, system constraints (requirements), and what we term “resilience modes” (additional system functions) in an MBSE toolset [6]. The MA Meta-model provides a standard set of design patterns to formalize this approach.
An assurance case, per ISO/IEC/IEEE 15026, is a reasoned, auditable artifact that supports the contention that an assurance claim has been satisfied, including systematic argumentation and supporting evidence. The assurance case components include claims, arguments, evidence, justifications, and assumptions. The goal of an assurance case is to communicate the assurance properties to stakeholders, informing their decision-making, and providing the necessary confidence in the system [10]. This report will show how a well-structured assurance pattern in an MBSE model improves standard assurance artifacts. It describes both the intended function and exploitable vulnerabilities in a common pattern. Neither standalone assurance cases in an argument-based format, nor tables of vulnerabilities, hazards, and risk, can compete with a functional model for communicating the linkage between vulnerabilities and intended function.
Assurance cases capture the subjective argument and structure judgment through claims which must be supported by evidence. Assurance cases include justification based on different methods of reasoning about the system properties. Aspects of dependability, safety, and security differ in their ways of reasoning. These methods produce evidence that can be qualitative or quantitative, deterministic, or non-deterministic [11]. Ideally, the development of an assured system would include cases that move from qualitative to quantitative and non-deterministic to deterministic as the system lifecycle matures. A model is a useful means to capture and manage the relationships between these different reasoning methods. Figure 5 in the report shows the relationship between different assurance reasoning approaches and the system decision lifecycle.
The different assurance methods loosely map to varying timelines during the system lifecycle. This project has been particularly interested in reasoning at higher abstraction levels, where the system's intended function would be initially defined. The cyber resilience process is heavily focused on high-level system behaviors and associated mission resilience features of the system, assuming that not all exploitable vulnerabilities can be eliminated. The assurance process should be started in conceptual stages, particularly mission engineering and system definition activities. In these stages, concepts of dependence and loss can be defined and prioritized as requirements, even though system vulnerabilities cannot. The proximity to the stage of the lifecycle addressed, reality, abstraction/level of fidelity, interoperability, tool support, and verification and validation were most closely met by the System Theoretic Accident Methods and Processes (STAMP) and associated System Theoretic Process Assessment (STPA) tools. The unification of the STPA technique for safety analysis and derived STPA-Sec for security analysis has proved to guide in-depth security analysis to the most vulnerable and critical components of a system. This research confirms that STPA, developed in the safety community, is the most effective method for reasoning about security assurance. STPA-Sec has been integrated into the CSRM and MA Meta-Model. The two of these together support the argument about assurance in a more significant portion of the lifecycle. The Structured Assurance Case Meta-model (SACM) standardizes the structure and use of assurance case language integrates well with CSRM and the MA Meta-Model but uses different language constructs and reasoning approaches. SACM and other argument or claim based approaches do not formalize the concept of architectural design patterns for safety, security, and resilience, so there are limits in describing resilience, for instance, in these approaches. Other approaches such as Hazard and Operability Analysis (HAZOP), fault and attack trees, and formal methods are more useful once the system architecture and preliminary design have been described (once component classes have been selected) [8] [12] [13]. This research suggests CRSM and the MA Metamodel are valuable additions to the assurance tool suite and can be more fully integrated with system architecture models.
A Complex Systems Case Study
An initial step in this research was to select a case study that could be used to apply and demonstrate the methods, processes, and tools used or developed in the project. The case study needed to be an example of a cyber-physical system, be used in a complex system-of-systems, and a case where the relationships between threat goals and benefits could be quantified concerning costs and risks associated with system resilience. We also wanted an openly publishable example. The selected case study represents a unique example of an advanced persistent threat in a critical infrastructure system exploited for monetary gain: pipeline and oil pumping stations and associated pipeline oil delivery operations and market activities. The case study was selected since it (1) represents a plausible APT in a critical infrastructure managed both through the human operator and cyber-physical control systems; (2) the relative scope of the threat team’s effort can be estimated, and the monetary gains from the attack can be modeled; and (3) it represents an exploitable gap in existing security practices in large systems tied to multiple organizations in the supply chain. Also, the case study architecture has a scope that can be modeled in present MBSE tools.
Standard means to express concepts of threat, resilience, safety, and assurance in a model
This research firmly established the credibility of the MBSE MA Meta-Model as an effective path toward security assurance in early-stage design. The general approach developed serves as a basis for a repeatable, yet flexible approach. The framework and foundations established in the research are ready for transition. A particular transition focus is toward mission engineering and early-stage system definition in the government MBSE modeling settings. Still, the techniques can and should be applied consistently across all program lifecycle phases. Modeling assurance cases and resulting resilience modes of the system is a crucial aspect of system architecting and the MBSE MA Meta-Model provides a standard architectural representation for loss scenarios, assurance requirements, and resilience features of the architecture. Assurance cases are intended to be developed and maintained for the full lifecycle. The MBSE MA Meta-Model provides a standard approach to capture all aspects of the assurance process.
The case study and model were integrated into a form where researchers analyzing the threat approach to exploit the system worked with researchers developing the Meta-Model. The goal was to simulate the reasoning concerning threats and modeled assurance properties in a realistic setting. As this process proceeded, the operational relationships between threat and assurance were captured in the form of a standard Concept of Operations. This approach proved useful in the CSRM process used on this project and aligned well with the way the DoD documents early-stage concept definition activities. The CONOPS format was extended to capture the system changes needed to counter cyber threats in an operational context. The CONOPS table of contents are included as Appendix B in this report.
Metrics framework that links together threat motivation with system loss trades
The oil and gas case study allowed definition of resilience metrics for evaluating the effectiveness of resilience solutions in response to safety and security violations while achieving operational priorities. The case study's meta-model relates the expert and operator perspectives, which are required for priority ranking of system losses, likelihood, and severity determination for attack vectors to evaluate the effectiveness and complexity of resilient modes. An essential set of metrics at the full system level includes attacker gain and defender loss, which have been poorly described in other security analysis methods. Other important evaluation metrics for resilient system modes include the operational impact and the time budget for system recovery. Recovery time includes detection time, isolation time, and restore time, including any operator decision time. System simulation can evaluate the recovery ratio for critical system functions under various system loads and simulated attack patterns. Tradespace analysis, based on resilience metrics, enables specifications of a system that responds to safety and security violations while achieving operational priorities within programmatic cost and time constraints. The use case, modeling, and Meta-Model showed the feasibility of evaluating such metrics for a given system. The research supports a resilience evaluation metrics framework that links together threat motivation with system loss trades, but further progress is needed to formalize.
Demonstration of the approach in a publicly accessible modeling case study
The published, open-source GitHub model is decomposed and organized according to the Mission Aware methodology using the Vitech GENESYS MBSE modeling tool, which was extended with our Meta-Model. The particular tool is not necessary to use the Meta-Model. We use the tool and its associated diagrams to visualize the different model views as defined by the Meta-Model. The public model can be explored at: https://coordinated-systems-lab.github.io/pipeline-cps/index.html.
The web-view model navigator, pictured in Figure 6 in the report, shows a package view to organize the model artifacts presented in this technical report. Expanding a package folder presents a hierarchy of related entity types. The System package defines the base System Model for the system under examination. Artifacts of the system description include the system context, the architecture of the system, and its functional behavior. The Risk package captures the Assurance Model (assurance cases), expressed in terms of losses, hazards, and unsafe actions. The Resilience package captures the system components and behaviors added to the system's base model that implements its resilience modes of operation, what we call the Mission Aware system. The Cyber package links loss scenarios to specific cyberattack vectors in an integrated Threat Model. These together form the MA Meta-Model. Further elaboration on the publicly accessible model is in the Oil & Gas Pipeline Model section of the report.
[Intended report number: SERC-2020-TR-005]