ITIL 4 defines a problem as “A cause, or potential cause, of one or more incidents” and the purpose of problem management as being:
“…to reduce the likelihood and impact of incidents by identifying actual and potential causes of incidents and managing workarounds and known errors.”
Source: AXELOS, Problem Management ITIL 4 Practice Guide (2020)
This isn’t a massive change from the ITIL v3/2011 iteration but it’s important to note the focus on identifying actual and potential root causes as well as on workarounds in the above.
Problems can be identified across your IT ecosystem - in releases, changes, updates/patches, vendor products, user errors, and failures. However, the key source for problem identification is usually the analysis of incidents data as part of what’s often called “proactive problem management.”
Other problem management terms need to be understood too, especially:
A problem that’s subject to problem management can be viewed as going on a journey from its identification through to some form of “resolution.” Where the resolution might involve error control or the creation of a workaround. Between identification and resolution is what ITIL 4 calls problem control - the analysis of problems that might transform a problem into a known error with an associated workaround.
Resolution ideally, but not always, involves error control - resolving known errors via the corporate change management process or change enablement practice. If a problem cannot be fixed but a workaround is identified, the problem is classified as a known error with a workaround. It’s logged in a known error database (KEDB) and made available to support teams. Sometimes though, no fix or workaround is identified. This is recorded as a “known problem” — with the information again made available to all support teams. A problem might move between these three states over time. For example, a workaround is made available while a problem awaits the required change.
While all this might feel very process-based, it’s important to understand that the right people with the right skillsets are important too. These will get a problem management capability working, and develop the process as they go, rather than simply defining a process and expecting it to work. They will also be able to successfully use problem management techniques to get to the root cause(s) of any given problem.
It’s also important to appreciate that problem management isn’t a standalone capability and needs to integrate well with other ITSM capabilities such as incident, change, and service configuration management. It’s also a key element for continual improvement.
Sometimes as well as knowing what something is, it’s also important to know what it isn’t. In the case of problem management there are two similar, but different ITSM capabilities that need to be seen, and treated, as different to incident management:
In terms of the first of these, it’s not uncommon for organizations that are doing a “drains up” after a major incident has been resolved to believe that they “do” problem management. In a way they do, but they’re only doing a small part of what they could be doing. In particular, they are proactively managing problems, merely reacting to issues are they arise (as a root cause of major incidents). However, following up on major incidents can be a great way to get a problem management capability off the ground.
The second - the confusion between incidents and problems - is something that has caused issues since the first version of ITIL. The ITIL 4 Foundation Edition publication helpfully explains that:
An incident is defined as “an unplanned interruption to a service or reduction in the quality of a service.” ITIL 4 defines a problem as “A cause, or potential cause, of one or more incidents.” Such problem management is focused on removing the things that cause repeat incidents and their impact.
A key differentiator between incident management and problem management is the need for urgency. That while speed is important to both, the reality is that the time needed to undertake problem management activities - including the identification of root causes - means that it operates at a far slower pace than incident management. This can be thought of as fire prevention versus fire-fighting.
It can be hard to know where to start when listing the benefits of problem management because different organizations will align more to some benefits than others. Whether it’s employing the end-user experience or reducing IT support costs or something in between, problem management is probably one of the most beneficial of all the ITSM practices. Example benefits are detailed below:
The key change between these ITIL versions is that ITIL 4 brought back problem and error control. Both were in earlier versions of ITIL but were not included in ITIL v3/2011. So, ITIL 4 - via problem control guidance - now offers helps with:
And error control guidance includes:
ITIL 4 also introduces the concept of problem modeling. Where, by prompting people to focus on various aspects of a service, they can manage an issue more appropriately. For example, if a problem is thought to be caused by a hardware fault, addressing this might not only fail to fix the problem, but it could also make things worse. By being open to the problem being caused by other factors - such as outdated working practices, technical debt, or compatibility issues - people can focus on the overall problem rather than being pulled in a single and potentially wrong direction.
Let’s start with the obvious. There’s a need for everyone involved to fully understand what problem management is. It’s the reason why there’s a “what problem management isn’t section” above - it’s easy for confusion between problem management and one or more of major incident management, incident management, and even continual improvement. Creating a formal scope statement for problem management will help with this. As will agreeing some short, medium, and long-term objectives for problem management as part of articulating the business value of problem management.
Then it’s important to ensure that sufficient time is made available for problem management activities. Without this, problem management initiatives will fail. For example, if IT support staff are given problem management responsibilities in addition to their incident management role, then due to the fire-fighting nature of the latter it’s likely that they’ll never have time to provide the needed attention to problem management.
At least start to ensure that problem management capabilities are linked in with other existing ITSM capabilities, rather than being an operational island. These links with other capabilities - such as incident management, change management (or enablement), availability management, capacity management, and continual improvement - will provide bidirectional opportunities and benefits.
It’s fine to start small with problem management as long as this involves staying focused and communicating successes to help drive further problem management investments. This could be starting with a report based on the analysis of incident trends that identifies the five problems, in terms of their adverse impact on the business, each month.
In doing this it’s important to view problem management as an end-to-end capability. So, ensure that efforts aren’t over-focused on problem identification, say, such that problem resolution levels suffer. Performance metrics will help here if you quantify the difference that problem management efforts have made to business operations and outcomes rather than the volumes of problems identified.
Finally, in addition to having the right people with the right skills for problem management, ensure that they have suitable tools. This ranges from the various techniques for problem analysis, through having a KEDB, to the workflow and data management capabilities of ITSM tools.
A fit-for-purpose ITSM tool has likely been built with ITIL best practices as a blueprint. This extends to problem management, with a suitable ITSM tool offering a range of capabilities across:
At a function and feature level, this might include capabilities such as:
An ITSM tool might also offer some root cause analysis capabilities, even if just simple information collection forms that can be associated with problem records. Examples include:
While all of the above has been focused on problem management through an ITSM lens, enterprise service management - “the use of IT service management (ITSM) principles, practices, and capabilities by other business functions to improve their operations, services, experiences, and outcomes” - provides an extra dimension to the use of problem management. In fact, research by AXELOS and ITSM.tools found that problem management is a shared ITSM capability for 60% of the organizations that already have an enterprise service management strategy in flight.
This makes it all the more important that your organization’s problem management practices are optimized. After all, sharing a sub-optimal ITSM practice in an attempt to improve the operations and outcomes of other business functions is a flawed approach.