Data Science Approaches to Prevent Failure in Systems Engineering
Systems Engineering and Systems Management Transformation
Report Number: SERC-2019-TR-008
Publication Date: 2019-06-14
Project: Data Science Approaches to Prevent Failure in Systems Engineering
Dr. Karen Marais
Dr. Bruno Ribeiro
This technical report documents progress under SERC RT-206 between June 15th, 2018 (task order start date) and June 14th, 2019 (task order completion date). The primary motivation for this research effort is a pressing need to identify ways of tracking project risk to prevent future systems engineering failures, while advances in data science approaches and neural network applications are the enablers.
Our work focuses on developing automated ways of tracking project risk based on two types of readily available information: enterprise software-derived data (Company inputs) and employee data collected via an app (Crowd inputs). The Company inputs carry risk information related to the daily operations of the organization (e.g., inventory data, number of failed parts, or financial data). We augment the database with Crowd inputs because we want to know what the people in the organization are doing to contribute to project risk. The underlying principle of our process is to collect these inputs continuously, frequently, and efficiently, and then process them using machine learning algorithms to predict failures. By predicting failures, we can make decision makers aware of the current risk of the projects in their organization, therefore giving them the opportunity to react before a failure occurs.
In this effort, we focused on developing the main functions of the failure prediction prototype and evaluating whether our approach is a valid process to measure risk. We did so by testing our prototype and process in engineering student teams at Purdue University. During the first year of support we have:
- Identified a set of potential causal or related factors that lead to failure and developed questions that aim to uncover the presence of these factors (Crowd inputs)
- Collected data for three semesters from design projects at Purdue University
- Completed statistical analyses to identify which Crowd inputs correlate with which types of failures
- Developed deep relational learning models that predict future project failures and failure causes
The report is organized as follows: First, we describe our process to identify factors that are associated with failures and to develop crowd signals that measure these factors. Second, we describe how we collected data from student design teams at Purdue University. Then, we present a series of mixed effects logistic regression models we trained using the collected data and their interpretation to identify which crowd signals correlate with increased probability of a failure or failure cause occurring during a project. The last section describes a deep learning approach to predict future failures and failure causes. We conclude the report with a summary of our completed work during the first year of the task and our plans for extending this work towards the goal of completing our prototype.