The ITIL body of service management best practice guidance offers a clear purpose statement for incident management:
“…to minimize the negative impact of incidents by restoring normal service operation as quickly as possible.”
Source: AXELOS, Incident Management ITIL 4 Practice Guide (2020)
Where an incident is defined as “An unplanned interruption to a service or reduction in the quality of a service.”
Importantly, In this guide on ITIL Incident Management, discover what it entails, best practices, how to implement it, and the reasons to do enterprise incident management.
Incident management as a capability has evolved since the first version of ITIL was introduced in 1989. While the process itself has stayed relatively similar, the ways in which incident management capabilities are made available to end users have changed considerably. As covered later, IT help desks, then IT service desks, and then ITSM tools provide enabling capabilities for IT support staff to effectively manage incidents via a variety of contact channels. Increasingly in ways that improve both the end-user experience and IT support personnel performance.
The addition of end-user self-help and self-service capabilities, and the introduction of “shift-left” strategies, then changed how incident management capabilities are employed. Such that some of the incidents that were handled by:
Finally, artificial intelligence (AI) - and machine learning in particular - has brought additional ways in which incidents are identified, reported, and remediated. Plus, Agile ways of working have offered new ways for managing incidents that are included in ITIL 4 as “swarming” - there’s more on both of these later.
No matter how incident management is invoked or enabled, the important thing to remember is that it helps keep IT and business services available and employees productive. Plus, if incident management is new to you, and your organization, you might find that you already do some form of it even if you call it something else (such as ticket management or issue handling).
While major incident management isn’t an area of focus here, it’s worth calling out its existence. ITIL 4 defines a major incident as: “An incident with significant business impact, requiring an immediate coordinated resolution.” It also offers a model for major incident management that starts with setting clear criteria for distinguishing major incidents from disasters and other incidents. It’s important to remember that what defines a major incident will likely vary between organizations based on a variety of factors from how they’re structured to their portfolio of critical business services.
Major incidents, as business-impacting incidents, are often handled as “all hands to the pumps” emergencies where significant IT resources are involved to help ensure both speedy remediation and resumption of business operations.
Sometimes as well as knowing what something is, it’s also important to know what it isn’t. In the case of incident management there are two similar, but different ITSM capabilities that need to be seen, and treated, as different to incident management:
The first of these capabilities is best differentiated by focusing on what is being handled, i.e. service requests rather than incidents. ITIL 4 defines service requests as “A request from a user or a user’s authorized representative that initiates a service action which has been agreed as a normal part of service delivery.” It’s a complicated definition that’s best understood through example types:
The various versions of ITIL best practice has long called out that it’s important to treat incidents and service requests separately due to their relative urgency.
The second of these capabilities is also best differentiated by starting with what’s being handled, i.e. problems rather than incidents. ITIL 4 defines a problem as “A cause, or potential cause, of one or more incidents.” Such problem management is focused on removing the things that cause repeat incidents and their impact.
As with service requests, a key differentiator between incident management and problem management is the need for urgency. That while speed is important to both, the reality is that the time needed to undertake problem management activities - including the identification of root causes - means that it operates at a far slower pace than incident management. This can be thought of as fire prevention versus fire-fighting.
To find out more about problem management, please read the InvGate Definitive Guide to Problem Management. Plus, ITIL offers best practice guidance on all three of incident management, service request management, and problem management.
ITIL v3/2011 recommended that incidents are managed through a process, this includes a number of formal steps or activities:
With continuous ownership, monitoring, tracking, and communication involved throughout.
ITIL 4 updated this, albeit only slightly, to be an incident handling and resolution process that forms part of the incident management practice (which also includes the periodic incident review process):
Explore the differences between ITIL v3/2011 and ITIL 4 here.
There are many benefits for an organization with a formal incident management capability, these include:
In addition to the generic ITIL 4 changes related to elements such as ITIL being service management not ITSM, practices rather than processes, and the service value system, ITIL 4 brought with it some incident-management-specific changes. The key one is that there’s no longer a prescriptive incident management process, with organizations encouraged to create their own value chain for incident management taking a customer, rather than IT, centric view.
There’s also the concept of swarming - where incident handling is a collaboration-based effort. There are no tiered support groups and no escalation between them. Instead, an issue is owned by an individual through to its resolution, with them bringing the right people in to assist as needed.
Incident management is one of the easiest ITSM processes/practices to justify despite it sometimes being viewed as “a cost of quality.” This justification is a great starting point for the introduction with incident management because it makes it more than “a good thing to do” and allows the scope of incident management coverage and its ambitions to be fleshed out as the costs and potential benefits are formulated. From staff numbers and operating times, through service-level targets, to the investment in ITSM tool enablement.
As mentioned earlier, there’s also the need to appreciate that your organization is probably already doing some form of incident management. It’s therefore important to thoroughly assess the status quo to see what can continue to be used rather than potentially lose something valuable in the mad rush to introduce a new way of working. For example, there might already be highly mature practices for remote support tool use.
In setting out - or improving - the scope of incident management, and in line with ITIL 4’s new focus, it’s important to understand how it will create business value. Ideally moving what’s traditionally been an issue-fixing IT support capability to one that’s focused on enabling end users and improving their productivity.
Tap into the wealth of available incident management practice that’s available in ITIL and other resources including ITSM tools. But ultimately, you need to create an incident management capability that’s best suited to your organization (rather than one that’s lifted from a completely unrelated organization). For example priority levels - some benchmarks might fit but others will need to be appraised and adjusted to suit what your organization needs.
Most ITSM tool vendors have created incident management templates, workflows, and reporting capabilities based on such best practices which might fit your organization’s needs or they might not. It’s therefore important that the ITSM tool “out-of-the-box” practices are changed if they go against your organization’s incident management needs but also important to retain the best practice that fits.
The setting of incident management priority levels, service-level targets, and performance metrics is another key activity that is best done in collaboration with key business stakeholders, albeit while recognizing that a balance needs to be found between meeting expectations and spending what will likely be limited funding wisely. This includes starting to appreciate the business impact of what IT support does and doesn’t do. For example, whether the delay in resolving an issue costs your organization more than if additional IT resources are used to ensure a quicker resolution.
Look to leverage all three of knowledge management, self-service, and automation to make incident management all three of “better, faster, cheaper.” Knowledge management will help with quicker and better resolutions (that are thus more cost-effective). While self-service, if done right, will provide end users with quicker access to - potentially automated - solutions that relieve the pressure on incident management staff and reduce labor costs. Automation is then the proverbial “icing on the cherry” making for quicker work and resolutions and extending the capabilities of unskilled people including end users via self-help.
Also, plan beyond the initial incident management capabilities using continual improvement to identify issues and to systematically find ways to improve the practice’s operations and outcomes. The aforementioned problem management should also be adopted to reduce the number of repeat incidents, even if only started in a small way to prove its value.
One could argue that incident management is a “backbone need” for ITSM tools. Not only because it’s the most highly adopted/used ITSM process or practice but because the evolution of IT help desk tools through to ITSM tools started with ticketing for the management of IT issues, i.e. incident management. It’s not unsurprising then that ITSM tools offer a high degree of enablement for incident management that goes above and beyond the core of workflow enablement, knowledge management, self-service, and reporting and analytics.
For example, native or third-party monitoring tools with event correlation capabilities for proactive issue detection. Access to performance and device status data - along with configuration management database (CMDB) and asset data - to facilitate incident diagnosis. Or the use of native or third-party capabilities for orchestration or remote administration in incident resolution.
As well as the many traditional incident management enablement capabilities of ITSM tools, there are increasingly opportunities for AI-enabled capabilities to help across all three of “better, faster, cheaper” incident resolution. These include:
While all of the above has been focused on incident management through an ITSM lens, enterprise service management - “the use of IT service management (ITSM) principles, practices, and capabilities by other business functions to improve their operations, services, experiences, and outcomes” - provides an extra dimension to the use of incident management. In fact, research by AXELOS and ITSM.tools found that incident management is the most commonly shared ITSM capability across organizations - at 78% of the organizations that already have an enterprise service management strategy in flight.
Finally, some ITAM best practices are aimed at ITAM professionals. There are multiple international standards, the main one of which is ISO/IEC 19770-1 which “specifies the requirements for the establishment, implementation, maintenance and improvement of a management system for IT asset management (ITAM), referred to as an “IT asset management system” (ITAMS).”
This makes it all the more important that your organization’s incident management practices are optimized. After all, sharing a sub-optimal ITSM practice in an attempt to improve the operations and outcomes of other business functions is a flawed approach.