Process Optimization – Incident Management


Charts & Graphs, Visualization

Background

Last Summer I worked on a product called Process Optimization in an enterprise product company. Process Optimization is a service that helps users find and fix inefficiencies and discover potential opportunities within a company’s processes to save resources (time, talent, and money). To optimize a process one needs to first mine it. Process mining is a series of methods that starts by extracting and consolidating records of the execution of a business process. This collection of records is called an Event Log. These records are then analyzed using different types of visualizations. The visualizations allow process analysts and process owners to identify issues and opportunities for improvement such as bottlenecks, sources of waste, and root causes of service-level violations. For this report, we will be considering a business process called Incident Management. Incident management is responsible for restoring services and resolving issues quickly. Keeping employees productive and happy is its main goal by ensuring that they can easily contact support to track and fix issues. Now, Let’s optimize this process.

Materials

In order to analyze an Incident Management process using visualization, 1 Dataset: Incident Response Log, an event log of an incident management process extracted from data gathered from the audit system of an instance of the ServiceNowTM platform used by an IT company downloaded from

Incident Response Log. (n.d.). Www.kaggle.com. Retrieved October 23, 2022, from https://www.kaggle.com/datasets/vipulshinde/incident-response-log?select=Incident_response.txt

and two main software will be used, one visualization software: Tableau, and one spreadsheet software: MS Excel. The data from Incident Response Log is exported as .csv directly to Tableau, which will then be modeled into custom visualizations that benefits our use case.

Method

The main focus of this report is to create appropriate visualizations that best benefit process analysts and process owners to identify, track, and resolve high‑impact incidents to optimize a incident management process and to do that we need one main dataset, here we are using ‘Incident Response Log’. Note that the dataset includes incidents opened from the End of February to Mid-May. But for this report, we will focus only on two months: March and April as sample space. Thus step one was to import the dataset into Tableau and filter them for the two months mentioned above. The second step and most important step is creating KPIs (Key performance indicators) specific to Incident management. KPIs are metrics used by businesses to help track their goals. Tracking KPIs through visualization can help identify anomalies quickly which will provide a jumping-off point for larger questions. For incident management, the number of incidents and average time to resolve could be two beneficial metrics. Thus the two major metrics created were: Incident over time and MTTR (Mean time to resolve).

Incidents over time are calculated as the [average number of incidents] within a given period.

MTTR is calculated as [Total time taken to resolve an Incident] / [# of Incidents] within a given period.

Calculated Fields

Step 3 – Now, both of these calculated metrics were graphed by weeks (only for March and April). Comparing both the metrics by time will allow us to closely track and identify 1) which metric is not doing well and 2) in what time frame the anomaly is happening. To visualise and compare the data, we used Line and Bar Graphs as a visualisation type. After identifying the metric and time frame (problem space), we filtered the assignment groups associated with these two factors to get more clarity on who or what is causing the inefficiency. Heatmap visualization is used to visualize how each assignment group is performing as it can clearly display the inefficiency intensity of each group within the selected time frame. 

Data visualization and Insights

A process analyst uses KPI Dashboard to understand how the two metrics are performing for the specified time period March to April and first notices that the MTTR metric is not doing so well lately. Identifies a sudden influx (increase) in the week of March 20, 2016. The MTTR has reached nearly 200 hrs from the initial 70 hrs couple of week ago. There seems to be some problem causing this upsurge. By theory MTTR metric is affected by two factors, 1) the number of incidents, 2) the average time taken by the assignment group to resolve an incident. Thus to start with, the process analyst checks for independencies and confirms that there has been a change in Incidents over time metric closer to that specific time frame (March 13, 2016). 

KPI Dashboard , created using Tableau

Now that overloading might be one of the factor or start point inducing the change, the process owner starts to further investigate the assignment groups based on their individual MTTR using MTTR by Assignment Group (March 2016) Heatmap visualization and finds that Group 61 (MTTR is 505.4) along with a few other assignment groups might be affected and are causing the inefficiency.

MTTR by Assignment Group (March 2016) Heatmap, created using Tableau

In the next step, using the time frame as a filter one can dig deeper into Group 61 to identify who and what is causing this high impact. Finally found that incident INC0000433 handled by Responder 165 in Group 61 seems to be responsible for the major influx of MTTR on the week of March 20, 2016, using MTTR by Group 61 (March 2016) & (Week of March 20, 2016) Heatmaps.

MTTR by Group 61 (March 2016) & (Week of March 20, 2016) Heatmap, created using Tableau

Reflections

After finding the point of the mishap, the process analyst can now reach out to Group 61’s manager or the responder to further identify the cause of the mishap. Get answers to questions like “Was it the overloading? How did it affect? Is there any other factor at play?”. But in an overview, we found that using data visualization to monitor and track KPIs can help process analysts to better optimize a business process.

The future direction of this experiment will be 1) testing if the same visualization works for other KPIs and 2) understanding the effects of hidden factors that have not been documented in the dataset.

Reference

https://public.tableau.com/authoring/ProcessOptimization-IncidentManagement/MTTRLinegraph#1

https://www.kaggle.com/datasets/vipulshinde/incident-response-log

https://www.servicenow.com/products/incident-management.html#!

https://docs.servicenow.com/en-US/bundle/tokyo-now-intelligence/page/administer/process-optimization/concept/summary-insights-dashboard.html

https://www.atlassian.com/incident-management/kpis