Why have an asset criticality registry at all if it’s already obvious what is “important”?

A registry records what each asset’s failure would endanger or cost, and gives all functions a single priority basis. That way M&R planning and the work queue reflect risk and consequences, not who shouts the loudest. This reduces unexpected downtime, rush purchases and constant rescheduling of planned tasks.

Why isn’t criticality equal to equipment cost?

Purchase or repair price does not show the cost of downtime. An inexpensive sensor or power module can stop a line, while an expensive unit may have a spare or be quickly replaced. Evaluate the damage from failure: people’s safety, production impact and financial losses.

What should be considered an asset: only shop-floor equipment or also IT and infrastructure?

Count as assets anything whose failure can stop operations, create safety risk, or cause material costs. Beyond machines and lines, often critical are power, UPS, network, servers, switches, cooling, fire systems and control nodes. When infrastructure fails, production can stop just as surely as with a broken machine.

At what level of detail should criticality be assessed: line, machine or component?

Choose the level at which you actually make maintenance and spare-part decisions. Start at the line or machine level and add components only where consequences and spare-part strategy differ significantly. Too fine a breakdown creates noise; too coarse hides distinct risks inside one object.

Why split the assessment into Safety, Production and Finance (S/P/F)?

Three axes prevent mixing different types of consequences. Safety assesses injury, major incidents and mandatory stops; production looks at downtime, quality drops and delivery impacts; finance covers repair costs, lost revenue and contractual penalties. This makes it clearer why an asset can be high-criticality even if repair cost is moderate.

How quickly can a 1–3 scale be agreed so ratings don’t “float”?

Describe levels with measurable, pre-agreed criteria rather than impressions. For each level set clear thresholds (e.g., downtime duration, type of safety incident, monetary loss ranges). Test the scale on 2–3 real examples so the team understands what constitutes a “1”, “2” or “3”.

How to calculate the final criticality class so there are fewer arguments in meetings?

The simplest rule is to take the maximum of S, P and F, since a high score on any axis requires attention. Often add a stop-factor: if S=3, overall criticality is always 3 regardless of other axes. The key is to record a short justification so the rating is clear later.

How to use a 3x3 "probability × consequences" matrix in practice?

A 3x3 matrix combines failure probability and consequences to quickly set priorities. Consequences are convenient to take as the maximum of S/P/F; probability comes from failure history, inspections and operating conditions over a clear period (e.g., 12–24 months). If an asset falls into the red zone, it needs a concrete action plan: diagnostics, root-cause fixes and readiness to restore.

How does criticality affect M&R planning and SLA for work orders?

Criticality changes depth and frequency of maintenance: high-criticality assets get more frequent preventive maintenance, functional tests and tighter recovery SLAs; low-criticality items are handled more by condition-based work and grouped shutdowns. Criticality also sets the order of reactive work so emergencies don’t consume the whole plan.

How to link criticality to MRO spares and avoid forgetting "grey areas" like contractors?

The registry clarifies what to stock and what to order on demand. For high-critical parts the stock must cover lead time and meet target recovery time even if the part is inexpensive. Also document responsibilities for leased and contractor-owned assets and any agreements on access, spares and response — otherwise a shutdown turns into disputes instead of actions.

Asset Criticality Register: Methodology for Maintenance and Spares

Why you need an asset criticality register

Criticality is an assessment of how dangerous and costly the failure of a particular asset would be for the business. It doesn’t measure “how important the equipment is by itself”, but what damage its downtime or failure would cause: to people’s safety, to production (or service delivery), and to finances. When that assessment is recorded in one place, it creates a common language between operations, production, HSE, finance and procurement.

A criticality register is needed so decisions stop depending on who shouts the loudest. Without it, requests that sound urgent often win over issues with higher actual risk. The result is firefighting work, constant rescheduling of maintenance, unexpected downtime and panic purchases “just in case”.

Criticality is not the same as cost. An expensive asset can be easy to replace or have a spare, while a cheap sensor can stop a line or create a safety hazard. Criticality is always about failure consequences and acceptable risk, not purchase price.

Decisions typically based on criticality include: which work goes first in the maintenance plan, where to require regulatory checks and more frequent monitoring, which MRO spares to keep on the shelf, how to protect the budget (what cannot be cut without increasing risk), and what response and recovery times to set for work orders.

A simple example: two pumps cost the same, but one feeds the fire-fighting system and the other a secondary circuit. If this isn’t reflected in the register, the team can spend time on visible breakdowns and miss quiet but dangerous risks.

What counts as an asset and at what level to assess

An asset in the register is anything whose failure can stop operations, create a safety risk, or cause noticeable costs. Don’t limit yourself to machines and lines. In practice, failures often happen in power, network, cooling or a key IT node rather than the machine itself.

Common entries include production equipment and key units (pumps, gearboxes, variable-frequency drives), IT and communications (servers, storage, switches, workstations, critical software-as-a-service), engineering infrastructure (power, UPS, compressors, ventilation, fire suppression), and safety elements (guards, sensors, emergency stops).

Choose the level of detail so that decisions about maintenance and spares can be made from it. If you plan work by “machine”, assessing every component creates noise. If spares and repairs are managed by assemblies, assessing only at the line level is too coarse.

A practical rule: start at the “line or machine” level and add components only where they truly differ in consequences or spare-part strategy. For example, for a server room it can be better to assess the unit “rack + power/UPS + switch” rather than each server separately, because that bundle determines downtime. If you manage local infrastructure (servers and workstations), record assets so it’s clear where a device is and who owns its availability.

The minimum fields in the register should answer four questions: “what is it”, “where is it”, “what is it for” and “who owns it”. Typically enough are:

unique identifier and name
installation location (shop, area, server room)
function (what it supports: safety, production, service)
owner/responsible party and the maintenance provider
current criticality and date of last review

Explicitly cover “grey zones”. Leased, contractor and shared engineering assets often fall out of scope despite having major impact. In the register note who actually fixes incidents, what SLAs/agreements exist, and whether there is access to spares and documentation. In a downtime event you won’t need to argue — you will act according to pre-agreed rules.

Three assessment axes: Safety, Production, Finance

An asset’s criticality almost always comes from three different questions: can it harm people/environments, how much does it affect production, and what will its failure cost. Mixing these topics into one indicator without rules quickly turns a register into a collection of opinions.

Safety (S) — consequences that cannot be “fixed later”: injuries or life‑threatening incidents, fire/explosion, leaks and environmental harm, and compliance with mandatory requirements and inspections. Even a rare failure can score high on this axis if the cost of error is unacceptable.

Production (P) — how a failure hits operations: line or area downtime, capacity loss, increased scrap, missed delivery dates. Look not only at stoppage but also hidden losses when equipment “sort of works” but quality fluctuates, speed drops, or changeovers take hours.

Finance (F) — direct and indirect money: repair and spare costs, contractor services, logistics, lost revenue, penalties, repairability (how quickly and costly it is to return the asset to service). The same downtime on different assets may cost very differently depending on product value or contract conditions.

Why agree on criterion meanings in advance

To make ratings comparable, fix short definitions before you discuss individual assets:

what counts as a safety incident (only injuries or also near-misses, environmental harm)
how to measure production impact (minutes of downtime, percent of scrap, delivery impact)
what monetary items to include in finance (only repair cost or also penalties and lost revenue)
what period to look at (shift, week, month)

Example: a furnace sensor failure can be “medium” for production (line down 30 minutes) but “high” for safety if it increases risk of overheating and fire. Such an asset should be prioritized despite small direct financial loss.

1–3 scales: how to agree clear levels

A 1–3 scale works when levels are described in plain terms and backed by facts: what we consider an incident, what downtime truly disrupts plans, and which losses matter. Usually one working session (M&R, production, HSE, finance) is enough to agree a shared vocabulary.

Key rule: scales must describe consequences (what happens if the asset fails), not failure probability. Probability can be considered separately in a risk matrix.

Below is an example of 1–3 levels for the three axes. Plug in thresholds appropriate to your site, but keep the logic and measurability.

Axis	Level 1 (low)	Level 2 (medium)	Level 3 (high)
Safety (S)	Minor incident without injury, first aid on site, no work stoppage	Injury causing loss of ability to work or a serious safety breach, requires area stop for investigation	Risk of severe injury/fatality, major accident, fire/explosion, mandatory prolonged stop and investigation
Production (P)	Downtime up to 2 hours, workaround or spare capacity exists, shift plan not disrupted	Downtime 2–24 hours, noticeable production loss, requires rescheduling or temporary fixes	Downtime over 24 hours or stoppage of a critical line, contract/plan breaches, recovery requires coordination across services
Finance (F)	Losses up to 1 million tenge or up to 0.5% of the area’s monthly costs	1–10 million tenge or 0.5–2% of area monthly costs	More than 10 million tenge or more than 2% of area monthly costs, plus possible penalties/compensations

To make sure the team is aligned, add one real asset example for each level (S1, S2, S3, etc.) and record it in the register. If an asset is repeatedly disputed, the issue is usually blurred thresholds or forgetting to include an important loss item (for example, penalties, scrap, urgent purchases).

Step-by-step method to assign criticality

A registry works best when assessment is a short repeatable cycle, not a one-time attempt to rate everything.

Collect an asset list with minimum data for each: purpose, operating mode (24/7, shift, seasonal), availability of spares or bypass, and what you consider “stoppage” (full, partial, quality drop). Duplicates and inconsistent names usually surface at this step.
Describe realistic failure scenarios — not “everything can fail”, but 2–4 events that actually happened or are plausible: overheating, leak, sensor failure, power loss, seizure. If a scenario can’t be explained in one sentence and tied to a measurable sign (vibration, temperature, alarm), it’s too vague.
Rate consequences on S/P/F using the agreed 1–3 scales. It’s better to rate by scenario and take the worst reasonable outcome. Example: a cooling pump failure might be P=3 (line stops), S=2 (burn risk with manual workaround), F=2 (repair and downtime costs).
Assign a final criticality class by a simple rule (e.g., the maximum of S/P/F or a summed score) and always record justification: what exactly will stop, for how long, and what barriers exist (spare, protection, workaround).
Agree ratings cross-functionally (production, HSE, finance, M&R). Immediately assign an owner of the record and the process for changing the class so the method does not become “everyone rates as they please” after a management change.

How to compute the final criticality: simple rules

Platform for registry and M&R

We will deploy and integrate servers and workstations for asset management and maintenance systems.

Start selection

Final criticality should be a practical tool to decide what to do first and what cannot be postponed. The register should include one clear calculation rule, understood by production, M&R and HSE alike.

Two practical approaches

Option 1: maximum of the three scores (S, P, F). The logic is transparent: if consequences are high on any axis, the asset cannot be considered “medium”.

Option 2: weighted sum. Useful when different sites truly prioritize axes differently and that affects decisions.

To keep the rule usable in practice, people often agree on these basics:

Stop-factor: if S=3, overall criticality is always 3, regardless of P and F.
By default: if the stop-factor doesn’t apply, overall = max(P, F).
Use weights only when necessary and document them formally.
If the formula produces fractions (e.g., 2.4), round to 3 to avoid debates over decimals.

Example. Pump: S=2, P=3, F=1 → overall = 3 by max rule. Another node: S=1, P=2, F=3 → overall = 3 because financial impact is high.

When and how to introduce weights

If you need weights, fix them by decree or regulation, not informal chat. Example: S 50%, P 30%, F 20%. The calculation must be transparent and explainable in one sentence in a planning meeting.

Example 3x3 criticality matrix (probability × consequences)

A 3x3 matrix helps quickly agree a priority: one axis is probability of failure, the other consequences. Decide in advance how you define “probability” and where the data comes from: failure frequency over 12–24 months, mean time between failures, inspection results, or operating conditions. If data is scarce, use expert judgement but mark it as an assumption.

Consequences can be treated two ways:

simpler: take the maximum of S/P/F
more precise: maintain three matrices (S, P, F) and combine them by the rule “red if red on any axis”

Example matrix (probability × consequences):

Consequences \\ Probability	Low	Medium	High
Low	Green	Green	Yellow
Medium	Green	Yellow	Red
High	Yellow	Red	Red

Zones are usually read as:

Green: routine planned work, no firefighting.
Yellow: increased monitoring, adjust frequencies, targeted improvements.
Red: priority for M&R, diagnostics and root-cause work, higher attention to spares.

Small example: a pump has probability “medium” (2–3 failures per year) and consequences “high” due to safety risk and line stoppage. It falls into the red zone even if the direct financial loss is moderate. If a protective measure and staff training later reduce safety consequences, the zone can change — but only after recording the cause and review date.

How to use criticality in M&R plans

Workstations for maintenance teams

We will choose workstations for M&R engineers, HSE specialists and on-site analysts.

Select workstations

Criticality lets you treat equipment differently. High-criticality assets get more preventive checks and inspections; low-criticality assets can be operated more on demand and grouped into larger downtime windows.

PM frequency and inspection depth

Simple logic: the higher the consequence, the earlier you want to catch degradation.

Class A (high criticality): more frequent preventive maintenance, mandatory functional tests, control measurements, root-cause analysis of repeated failures.
Class B (medium): standard procedures, selective checks, focus on typical weaknesses.
Class C (low): minimal inspections, more condition-based work, allowance to bundle with other outages.

For example, the same pump may be class A if it stops the line, and class C if a nearby spare exists and downtime doesn’t affect safety.

Work prioritization and SLA: reactive vs planned

Criticality shapes the work queue so reactive tasks don’t consume the plan.

A: emergency requests always take priority over planned tasks, and planned A tasks cannot be postponed without owner approval.
B: emergencies have priority; planned work can be postponed with a clear deadline and reason.
C: postponement and bundling are acceptable if risk doesn’t increase.

For SLAs define two times: response (when the technician started work) and recovery (when the asset returned to service). A has the strictest targets; C is more relaxed.

What to do with deferred work

Postponement is allowed only if you can show the risk did not increase. Don’t delay work when there are signs of progressive failure, repeated breakdowns, safety observations, or when the task secures a critical barrier (protection, lockout, alarm). For class A, delays are usually allowed only with compensating measures: temporary monitoring, load reduction, spare activation or expedited spare procurement.

How to link criticality with spares and MRO purchasing

With a clear criticality class, the warehouse stops operating “by habit”. The register answers what to keep, what to buy per need, and what not to stock.

A practical approach is to split SKUs by zone (red, yellow, green) and tie replenishment rules to that. Consider not only part price but lead time, replacement complexity and downtime consequences.

Red zone: mandatory safety stock (min–max), reorder point must cover lead time with a buffer.
Yellow zone: limited stock, review levels more often.
Green zone: order on demand, keep only fast-moving consumables.

Then set simple inventory parameters: min (minimum for lead time), reorder point, max (upper bound to avoid tying up cash). If lead times are long or unstable, raise min even for inexpensive parts.

Also agree rules for interchangeability and "cannibalization" (taking parts from other equipment). This works only with an approved list of compatible items and a clear return policy: who and when must restore the donor. In IT, for example, temporarily moving a PSU from a less critical rack is sometimes allowed, but replacement must be ordered and donor restored by a defined deadline.

To avoid a “museum warehouse”, set disposal rules: no consumption for N months and no applicable assets, asset decommissioned, a more universal replacement appears, or storage cost and obsolescence risk exceed benefit. Review the SKU list together with criticality after major failures, upgrades or supplier changes.

Short example: how the register changes work priorities

Imagine a line with a single bottleneck and no spare. If the area stops, production halts. Before a register, requests often follow “who screams louder”, and the maintenance plan is built around habitual tasks rather than risk.

Take three assets on the area: a pump (pumping/cooling), a control cabinet (PLC, power, automation) and an operator industrial PC (HMI/SCADA). Rate S, P and F on 1–3.

Pump: S=2 (risk of overheating/leak but protections exist), P=3 (bottleneck stops the line), F=2 (moderate repair cost, some losses)
Control cabinet: S=3 (risk of dangerous failure, incorrect commands), P=3 (100% stoppage), F=3 (expensive repair and downtime)
Industrial PC: S=1 (indirect safety impact), P=2 (can switch to manual/backup terminal but slower), F=1 (replacement is relatively cheap)

Now apply a 3x3 probability × consequences matrix. Consequences can be taken as the maximum of S/P/F (or a rule that if S=3 then consequences ≥3).

Example mapping:

Pump: probability 2, consequences 3 → red
Control cabinet: probability 2, consequences 3 → red
Industrial PC: probability 3, consequences 2 → yellow

Practical changes: for the cabinet and pump the M&R plan becomes stricter — condition monitoring, protection tests, critical-component procedures, supplier and repair-time controls. Spares become mandatory (VFD/PSU/I/O modules, seal/ bearing kits) with target restoration times.

In the work queue, even small symptoms on red assets (temperature spikes, power dips, module errors) jump ahead of cosmetic issues on HMI. The PC remains important but is handled by standards: disk image, hot swap, minimal unique models.

Typical mistakes and pitfalls when implementing

Check critical IT infrastructure

We will select servers, storage and network equipment according to your criticality classes and recovery time objectives.

Request calculation

The most common confusion: treating criticality as “how bad it is right now”. If a pump is noisy or a machine stops frequently, that’s condition and urgency. Criticality answers a different question: what happens if the asset fails at the worst moment. Sometimes an asset “works” now but its failure would cause injury or line stoppage.

A second trap is focusing only on repair price. An expensive spare is not always high risk, while a cheap sensor can stop production for a day. Include downtime, quality, safety and compliance, and count finance beyond the service invoice.

To keep ratings stable, lock scales and rules down and record justification. Six months later, when a shift supervisor or engineer changes, the register should still explain why an asset got its level.

What most often breaks the method

One department rates assets alone without production, HSE, quality and procurement input.
Decisions based on memory, not facts: downtime duration, failure frequency, available bypasses.
Rare but severe safety scenarios are rated low because “it never happened here”.
The register is created once and filed away, not updated after tech changes.

Another problem is the register not being linked to actions. If criticality doesn’t influence M&R prioritization, procedures, MRO levels and lead times, people quickly stop investing time in careful ratings.

Checklist and rules for reviewing the register

Treat the criticality register as a living document.

Basic checks:

A registry owner is assigned (role, not a person) and the change process is clear.
Scales and rules are fixed and uniform across areas.
Each item has a last review date and data source (incidents, downtime, costs).
Red assets don’t stay red without action: there is a work plan and response times.
Yellow assets are monitored and have clear triggers to escalate to red.

Then look at actions per zone. For red assets it must be clear what M&R work will be done, what minimum MRO stock is kept, who’s responsible and how fast we respond. For yellow keep monitoring (walkaround, vibration, temp, run hours) and an escalation rule describing symptoms that move the asset to red.

Tie review rules to events:

Regular: at least once a year, plus a short quarterly check of the top‑20 critical items.
After changes: modernization, technology change, load profile shifts.
After events: accidents, safety incidents, repeated failures.
Supply chain: supplier change, increased lead times, product discontinuation.
Economics: material change in cost of downtime, penalties, or energy costs.

A sensible next step is a pilot on one area (50–150 assets): agree scales, run assessments and link results to the M&R plan and inventory norms, then move the registry into your asset management system. If IT infrastructure (servers, workstations) and system integration are needed, you can address them with a systems integrator. For example, GSE.kz as a manufacturer and integrator in Kazakhstan supplies workstations and servers and can assist with deployment and support of the infrastructure for such tasks.