What is problem management?

ITIL 4 defines a problem as “A cause, or potential cause, of one or more incidents” and the purpose of problem management as being:

“…to reduce the likelihood and impact of incidents by identifying actual and potential causes of incidents and managing workarounds and known errors.”

Source: AXELOS, Problem Management ITIL 4 Practice Guide (2020)

This isn’t a massive change from the ITIL v3/2011 iteration but it’s important to note the focus on identifying actual and potential root causes as well as on workarounds in the above.

Problems can be identified across your IT ecosystem - in releases, changes, updates/patches, vendor products, user errors, and failures. However, the key source for problem identification is usually the analysis of incidents data as part of what’s often called “proactive problem management.”

Other problem management terms need to be understood too, especially:

  • Known error - “A problem that has been analyzed but has not been resolved.”
  • Workaround - “A solution that reduces or eliminates the impact of an incident or problem for which a full resolution is not yet available.”
What is problem management?

What problem
management entails

A problem that’s subject to problem management can be viewed as going on a journey from its identification through to some form of “resolution.” Where the resolution might involve error control or the creation of a workaround. Between identification and resolution is what ITIL 4 calls problem control - the analysis of problems that might transform a problem into a known error with an associated workaround.

Resolution ideally, but not always, involves error control - resolving known errors via the corporate change management process or change enablement practice. If a problem cannot be fixed but a workaround is identified, the problem is classified as a known error with a workaround. It’s logged in a known error database (KEDB) and made available to support teams. Sometimes though, no fix or workaround is identified. This is recorded as a “known problem” — with the information again made available to all support teams. A problem might move between these three states over time. For example, a workaround is made available while a problem awaits the required change.

While all this might feel very process-based, it’s important to understand that the right people with the right skillsets are important too. These will get a problem management capability working, and develop the process as they go, rather than simply defining a process and expecting it to work. They will also be able to successfully use problem management techniques to get to the root cause(s) of any given problem.

It’s also important to appreciate that problem management isn’t a standalone capability and needs to integrate well with other ITSM capabilities such as incident, change, and service configuration management. It’s also a key element for continual improvement.

What problem management entails
What Problem Management isn’t

What Problem
Management isn’t

Sometimes as well as knowing what something is, it’s also important to know what it isn’t. In the case of problem management there are two similar, but different ITSM capabilities that need to be seen, and treated, as different to incident management:

  • Major incident management
  • Incident management

In terms of the first of these, it’s not uncommon for organizations that are doing a “drains up” after a major incident has been resolved to believe that they “do” problem management. In a way they do, but they’re only doing a small part of what they could be doing. In particular, they are proactively managing problems, merely reacting to issues are they arise (as a root cause of major incidents). However, following up on major incidents can be a great way to get a problem management capability off the ground.

The second - the confusion between incidents and problems - is something that has caused issues since the first version of ITIL. The ITIL 4 Foundation Edition publication helpfully explains that:

  • Incidents are break-fix pieces of work, which cause a negative impact on people and as such need to be resolved so that normal work can continue.
  • Problems cause incidents. They need to be analyzed and investigated so that workarounds and resolutions can be identified which will, in turn, reduce the number and impact of future incidents.

Where, while an incident is defined as “An unplanned interruption to a service or reduction in the quality of a service.” ITIL 4 defines a problem as “A cause, or potential cause, of one or more incidents.” Such that problem management is focused on removing the things that cause repeat incidents and their impact.

A key differentiator between incident management and problem management is the need for urgency. That while speed is important to both, the reality is that the time needed to undertake problem management activities - including the identification of root causes - means that it operates at a far slower pace than incident management. This can be thought of as fire prevention versus fire-fighting.

The need for, and benefits of, Problem Management

It can be hard to know where to start when listing the benefits of problem management because different organizations will align more to some benefits than others. Whether it’s employing the end-user experience or reducing IT support costs or something in between, problem management is probably one of the most beneficial of all the ITSM practices. Example benefits are detailed below:

  • Speeding up incident resolution and minimizing downtime and employee lost productivity. For example, where a KEDB is leveraged to access known error data including the associated workaround.
  • Reducing incident levels and the IT support time and effort they consume. Are your end users plagued by similar, potentially repeat, issues and are IT support staff fixing the same issues repeatedly? It’s not rocket science, if you don’t resolve an underlying problem, then the resulting incidents are going to come back again (and again).
  • The reduction in incidents means less pressure and more time for IT support staff to focus on higher value-add work. This makes better use of potentially scarce IT resources and in reducing repeat incident tickets, not only does the fire-fighting workload reduce but the monotony of fixing the same issues also goes away.
  • Disruption prevention. Proactive problem management can highlight when a service interruption might happen such that the necessary action can be taken to ensure that it doesn’t. While it will reduce incident tickets the main benefit is preventing potentially business-crippling major incidents.
  • Reducing the costs of IT issues. Whether this is the reduced number of incidents, that incidents are quicker and cheaper to resolve, or that the business impact of major incidents is lower.
  • Improving the business’s perceptions of the IT organization as a whole - even if the proactivity isn’t visible to end users, they will feel the benefit of fewer issues and quicker resolutions.

Problem Management in ITIL 4
vs. Problem Management in ITIL v3/2011

The ITIL 4 Service Value System

The key change between these ITIL versions is that ITIL 4 brought back problem and error control. Both were in earlier versions of ITIL but were not included in ITIL v3/2011. So, ITIL 4 - via problem control guidance - now offers helps with:

  • Prioritizing problems
  • Understanding complexity
  • Creating workarounds

And error control guidance includes:

  • Making sure that any problems are flagged as known errors.
  • Evaluating the effectiveness of workarounds
  • Suggesting permanent resolutions

ITIL 4 also introduces the concept of problem modeling. Where, by prompting people to focus on various aspects of a service, they can manage an issue more appropriately. For example, if a problem is thought to be caused by a hardware fault, addressing this might not only fail to fix the problem, but it could also make things worse. By being open to the problem being caused by other factors - such as outdated working practices, technical debt, or compatibility issues - people can focus on the overall problem rather than being pulled in a single and potentially wrong direction.

How to start with
Problem Management

Let’s start with the obvious. There’s a need for everyone involved to fully understand what problem management is. It’s the reason why there’s a “what problem management isn’t section” above - it’s easy for confusion between problem management and one or more of major incident management, incident management, and even continual improvement. Creating a formal scope statement for problem management will help with this. As will agreeing some short, medium, and long-term objectives for problem management as part of articulating the business value of problem management.

Then it’s important to ensure that sufficient time is made available for problem management activities. Without this, problem management initiatives will fail. For example, if IT support staff are given problem management responsibilities in addition to their incident management role, then due to the fire-fighting nature of the latter it’s likely that they’ll never have time to provide the needed attention to problem management.

At least start to ensure that problem management capabilities are linked in with other existing ITSM capabilities, rather than being an operational island. These links with other capabilities - such as incident management, change management (or enablement), availability management, capacity management, and continual improvement - will provide bidirectional opportunities and benefits.

It’s fine to start small with problem management as long as this involves staying focused and communicating successes to help drive further problem management investments. This could be starting with a report based on the analysis of incident trends that identifies the five problems, in terms of their adverse impact on the business, each month.

In doing this it’s important to view problem management as an end-to-end capability. So, ensure that efforts aren’t over-focused on problem identification, say, such that problem resolution levels suffer. Performance metrics will help here if you quantify the difference that problem management efforts have made to business operations and outcomes rather than the volumes of problems identified.

Finally, in addition to having the right people with the right skills for problem management, ensure that they have suitable tools. This ranges from the various techniques for problem analysis, through having a KEDB, to the workflow and data management capabilities of ITSM tools.

How to start with Problem Management

How ITSM tools help with Problem Management

A fit-for-purpose ITSM tool has likely been built with ITIL best practices as a blueprint. This extends to problem management, with a suitable ITSM tool offering a range of capabilities across:

  • Provision of a KEDB that shares details of unresolved problems and their associated workarounds
  • Automated workflows for the problem management process - from problem ticket creation to closure, with alerts and notifications along the way
  • Integrations with other ITSM capabilities such as incident records, monitoring data, or configuration management database (CMDB) configuration item (CI) data
  • Analytics and reporting - from the ability to spot incident management trends for proactive problem management to providing performance metrics that demonstrate the success of the undertaken problem management activities

At a function and feature level, this might include capabilities such as:

  • The creation of discrete problem records
  • Automated two-way association of problem records to CIs
  • The ability to prioritize problems based on impact and urgency
  • The ability to assign and route problems to appropriate support staff
  • Automated tracking abilities and auto-escalation based on prioritization and agreed thresholds
  • The ability to create a request for change from a problem record with ongoing associations
  • The ability to make known error, problem, and workaround information available to support staff and to end users via self-service capabilities

An ITSM tool might also offer some root cause analysis capabilities, even if just simple information collection forms that can be associated with problem records. Examples include:

  • The 5 Whys technique - where “Why?” is asked five times to drill down to establish a problem’s root cause(s)
  • The Kepner and Fourie Critical Thinking approach - which uses a template with structured questions to guide people through an analysis phase that encapsulates agreeing on what the problem is, gathering information, organizing and analyzing the information, and drawing a conclusion
  • Fault tree analysis - where people take a problem and visually log the possible reasons that could have caused it
  • Ishikawa diagrams - a technique that combines mind mapping and brainstorming to build a visual “cause and effect” diagram
How ITIL will enable your organization
Enterprise service management

Enterprise service management adds an extra dimension to problem management

While all of the above has been focused on problem management through an ITSM lens, enterprise service management - “the use of IT service management (ITSM) principles, practices, and capabilities by other business functions to improve their operations, services, experiences, and outcomes” - provides an extra dimension to the use of problem management. In fact, research by AXELOS and ITSM.tools found that problem management is a shared ITSM capability for 60% of the organizations that already have an enterprise service management strategy in flight.

This makes it all the more important that your organization’s problem management practices are optimized. After all, sharing a sub-optimal ITSM practice in an attempt to improve the operations and outcomes of other business functions is a flawed approach.

Frequently Asked Questions

Problem management seeks to identify and remove underlying causes of Incidents, prevent Incident and Problems, and improve organizational efficiency by ensuring that Problems are prioritized correctly according to impact, urgency, and severity.
Greater service availability by eliminating recurring Incidents, Containment of incidents before they impact other systems, and the elimination of incidents before they impact services through proactive problem management.
The purpose of Incident Management is to restore normal service as quickly as possible and minimize adverse impacts on business operations. Incident Management is used to manage any event that disrupts or has the potential to disrupt any IT service and associated processes.
A known error is a problem that is successfully diagnosed and either a work-around or a permanent resolution has been identified. Known errors should be documented in the knowledge base as articles so that a resolution is captured and shared across the organization and the user community.

Evaluate InvGate as Your ITSM Solution

30-day free trial - No credit card needed

Get Started